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Preface 



This book describes the MIPS R4000 and R4400 family of RISC 
microprocessors (also referred to in this book as processor). 

Overview of the Contents 

Chapter 1 is a discussion (including the historical context) of RISC 
development in general, and the R4000 microprocessor in particular. 

Chapter 2 is an overview of the CPU instruction set. 

Chapter 3 describes the operation of the R4000 instruction execution 
pipeline, including the basic operation of the pipeline and 
interruptions that are caused by interlocks and exceptions. 

Chapter 4 describes the memory management system including 
address mapping and address spaces, virtual memory, the translation 
lookaside buffer (TLB), and the System Control Processor (CPO). 

Chapter 5 describes the exception processing resources of R4000 
processor. It includes an overview of the CPU exception handling 
process and describes the format and use of each CPU exception 
handling register. 
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Chapter 6 describes the Floating-Point Unit (FPU), a coprocessor for 
the CPU that extends the CPU instruction set to perform floating- 
point arithmetic operations. This chapter lists the FPU registers and 
instructions. 

Chapter 7 describes the FPU exception processing. 

Chapter 8 describes the signals that pass between the R4000 processor 
and other components in a system. The signals discussed include the 
System interface, the Clock / Control interface, the Secondary Cache 
interface, the Interrupt interface, the Initialization interface, and the 
JTAG interface. 

Chapter 9 describes in more detail the Initialization interface, which 
includes the boot modes for the processor, as well as system resets. 

Chapter 10 describes the clocks used in the R4000 processor, as well as 
the processor status reporting mechanism. 

Chapter 11 discusses cache memory, including the operation of the 
primary and secondary caches, and cache coherency in a 
multiprocessor system. 

Chapter 12 describes the System interface, which allows the processor 
access to external resources such as memory and input/ output (I/O). 
It also allows an external agent access to the internal resources of the 
processor, such as the secondary cache. 

Chapter 13 describes the Secondary Cache interface, including read 
and write cycle timing. This chapter also discusses the interface buses 
and signals. 

Chapter 14 describes the Joint Test Action Group (JTAG) interface. 
The JTAG boimdary scan mechanism tests the interconnections 
between the R4000 processor, the printed circuit board to which it is 
moimted, and other components on the board. 

Chapter 15 describes the single nonmaskable processor interrupt, 
along with the six hardware and two software processor interrupts. 

Chapter 16 describes the error checking and correcting (ECC) 
mechanisms of the R4000 processor. 
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Appendix A describes the R4000 CPU instructions, in both 32- and 64- 
bit modes. The instruction list is given in alphabetical order. 

Appendix B describes the R4000 FPU instructions, listed 
alphabetically. 

Appendix C describes sub-block ordering, a nonsequential method of 
retrieving data. 

Appendix D describes the output buffer and the Ai/At control 
mechanism. 

Appendix E describes the passive components that make up the 
phase-locked loop (PLL). 

Appendix F describes Coprocessor 0 hazards. 

I Appendix G describes the R4000 pinout. 



A Note on Style 

A brief note on some of the stylistic conventions used in this book: bits, 
fields, and registers of interest from a software perspective are 
italicized (such as Config register); signal names of more importance 
from a hardware point of view are rendered in bold (such as Reset*). 

A range of bits uses a colon as a separator; for instance, (15:0) 
represents the 16-bit range that runs from bit 0, inclusive, through bit 
15. (In some places an ellipsis may used in place of the colon for 
visibility: (15...0).) 
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Changes From the First Edition 

The second edition of this book incorporates certain low-level changes 
and technical additions, but retains a substantive identity with the 
original version. 

Changes from the first edition are indicated by left-margin vertical 
rules. 

Getting MIPS Documents On-Line 

MIPS documents (including an electronic version of the errata) are 
available on-line, through the file transport protocol (FTP). To 
retrieve them, follow the steps below. The text you are to type is 
shown in Courier Bold font; the computer's responses are in 

shown in Courier Regular font. 

1. First, place yourself in the directory on your system within which 
you want to store the retrieved files. Do this by tjrping: 

cd <direct or y_y ou_wan t _f i le_t o_be_i n > 

2. Access the MIPS document server, sgigate, through FTP by 

typing: 

ftp sgigate . sgi . com 

3. The server tells you when you are connected for FTP by 
responding: 

Connected to sgigate.sgi.com. 
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4. Next (after some announcements) the server asks you to log in by 
requesting a name and then a password. 

Name ( sgigate . sgi . com : <login_name> ) : 

5. Login by typing anonymous for your name and your electronic 
mail address for your password. 

Name (sgigate . sgi . com: <login_name>) : anonymous 

331 Guest login ok, type your name as 
password . 

Password: your_email_address 

6. The system indicates you have successfully logged in by 
supplying an FTP prompt: 

ftp> 

7. Go to the pub /doc directory by typing: 
ftp> cd pub/doc 

8. You can take a look at the contents of the doc directory by listing 
them: 

ftp> Is 

9. You will find several R4000-related subdirectories, such as R4200, 
R4400, and R4600. When you find the subdirectory you want, cd 
into that subdirectory and retrieve the file you want by t5rping: 

get <filename> 

This copies the file from sgigate back to youj system. 

10. When you have retrieved the files you want, exit from ftp by 
typing: 

ftp> quit 

11. If the file was encoded for transmission, you must decode it, after 
retrieval, by t5^ing: 

uudecode <filename> 

12. If the file was compressed for transmission, you must imcompress 
it, after retrieval, by typing: 

uncompress <filename> 

13. If you tarred the file, type: 
tar xvof <filename> 



xii 
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Historically, the evolution of computer architectures has been dominated 
by families of increasingly complex central processors. Under market 
pressiires to preserve existing software, complex instruction set computer 
(CISC) architectures evolved by the accretion of microcode and 
increasingly intricate instruction sets. This intricacy in architecture was 
itself driven by the need to support high-level languages and operating 
systems, as advances in semiconductor technology made it possible to 
fabricate integrated circuits of greater and greater complexity. And at that 
time it seemed self-evident to designers that architectures should continue 
to become more and more complex as technological advances made such 
VLSI designs possible. 
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In recent years, however, reduced instruction set computer (RISC) 
architectures are unplementing a different model for the interaction 
between hardware, firmware, and software. RISC concepts emerged from 
a statistical analysis of the way in which software actually uses processor 
resources: dynamic measurement of system kernels an.d object modules 
generated by optimizing compilers showed that the simplest instructions 
were used most often — even in the code for CISC machines. 
Correspondingly, complex instructions often went unused because their 
single way of performing a complex operation rarely matched the precise 
needs of a high-level language. 

RISC architecture eliminates microcode routines and turns low-level 
control of the machine over to software. The RISC approach is not new, 
but its application has become more prevalent in recent years, due to the 
increasing use of high-level languages, the development of compilers that 
are able to optimize at the microcode level, and dramatic advances in 
semiconductor memory and packaging. It is now feasible to replace 
relatively slow microcode ROM with faster RAM that is organized as an 
instruction cache. Machine control resides in this instruction cache that is, 
in effect, customized on-the-fly: the instruction stream generated by 
system- and compiler-generated code provides a precise fit between the 
requirements of high-level software and the low-level capabilities of the 
hardware. 

Reducing or simplifying the instruction set was not the primary goal of 
RISC architecture; it is a pleasant side effect of techniques used to gain the 
highest performance possible from available technology. Thus, the term 
reduced instruction set computers is a bit misleading; it is the push for 
performance that really drives and shapes RISC designs. 

1.1 Benefits of RISC Design 

Some benefits that result from RISC design techniques are not directly 
attributable to the drive to increase performance, but are a result of the 
basic reduction in complexity — a simpler design allows both chip-area 
resources and human resources to be applied to features that enhance 
performance. Some of these benefits are described below. 
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Shorter Design Cycle 

The architectures of RISC processors can be implemented more quickly 
than their CISC counterparts: it is easier to fabricate and debug a 
streamlined, simplified architecture with no microcode than a complex 
architecture that uses microcode. CISC processors have such a long 
design cycle that they may not be completely debugged by the time they 
are technologically obsolete. The shorter time required to design and 
implement RISC processors allows them to make use of the best available 
technologies. 



Effective Utilization of Chip Area 

The simplicity of RISC processors also frees scarce chip geography for 
performance-critical resources such as larger register files, translation 
lookaside buffers (TLBs), coprocessors, and fast multiply and divide imits. 
Such resources help RISC processors obtain an even greater performance 
edge. 



User (Programmer) Benefits 

Simplicity in architecture also helps the user by providing a uniform 
instruction set that is easier to use. This allows a closer correlation 
between the instruction count and the cycle coun.t, making it easier to 
measure code optimization activities. 



Advanced Semiconductor Technologies 

Each new VLSI technology is introduced with tight limits on the number 
of transistors that fit on each chip. Since the simplicity of a RISC processor 

allows it to be implemented in fewer transistors than its CISC counterpart, 
the first computers capable of exploiting these new VLSI technologies 
have been using and will continue to use RISC architecture. 
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Optimizing Compilers 

RISC architecture is designed so that the compilers, not assembly 
languages, have the optimal working environment. RISC philosophy 
assumes that high-level language programming is used, which contradicts 
the older CISC philosophy that assumes assembly language programming 
is of primary importance. 

The trend toward high-level language instructions has led to the 
development of more efficient compilers to convert high-level language 
instructions to machine code. Primary measures of compiler efficiency are 
the compactness of its generated code and the shortness of its execution 
time. 

During the development of more efficient compilers, analysis of 
instruction streams revealed that the greatest amoimt of time was spent 
executing simple instructions and performing load and store operations, 
while the more complex instructions were used less frequently. It was also 
learned that compilers produce code that is often a narrow subset of the 
processor instruction set architecture (ISA). A compiler works more 
efficiently with instructions that perform simple, well-defined operations 
and generate minimal side-effects. Compilers do not use complex 
instructions and features; the more complex, powerful instructions are 
either too difficult for the compiler to employ or those instructions do not 
precisely fit high-level language requirements. 

Thus, a natural match exists between RISC architectures and efficient, 

optimizing compilers. This match makes it easier for compilers to 
generate the most effective sequences of machine instructions to 
accomplish tasks defined by the high-level language. 
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MIPS RISCompiler Language Suite 

Some compiler products are derived from disparate sources and 
consequently do not fit together very well. Instead of treating each 
language's compiler as a separate entity, the MIPS RISCompiler™ 
language suite shares common elements across the entire family of 
compilers. In this way the language suite offers both tight integration and 
broad language coverage. 

The MIPS language suite supports: 

• industry-standard front ends for the following languages (C, 
FORTRAN, Pascal) 

• a common intermediate language, offering an efficient way to 

add language front ends over time 

• all of the back end optimization and code generation 

• the same object format and calling conventions 

• mixed-language programs 

• debugging of programs written in all languages, including 
mixtures 

This language suite approach yields high-quality compilers for all 
languages, since common elements make up the majority of each of the 
language products. In addition, this approach provides the ability to 
develop and execute multi-language programs, promoting flexibility in 
development, avoiding the necessity of recoding proven program 
segments, and protecting the user's software investment. The common 
back-end also exports optimizing and code-generating improvements 
immediately throughout the language suite, thereby reducing 
maintenance. 
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1.2 Compatibility 

The R4000 processor provides complete application software 
compatibility with the MIPS R2000, R3000, and R6000 processors. 
Although the MIPS processor architecture has evolved in response to a 
compromise between software and hardware resources in the computer 
system, the R4000 processor implements the MIPS ISA for user-mode 
programs. This guarantees that user programs conforming to the ISA 
execute on any MIPS hardware implementation. 

1.3 Processor General Features 

This section briefly describes the programming model, the memory 
management unit (MMU), and the caches in the R4000 processor. A more 
detailed description is given in succeeding sections. 

• Full 32-bit and 64-bit Operations. The R4000 processor 
contains 32 general purpose 64-bit registers. (When operating 
as a 32-bit processor, the general purpose registers are 32-bits 
wide.) All instructions are 32 bits wide. 

• Efficient Pipeline. The superpipeline design of the processor 
results in an execution rate approaching one instruction per 
cycle. Pipeline stalls and exceptional events are handled 

precisely and efficiently. 

• MMU. The R4000 processor uses an on-chip TLB that provides 
rapid virtual-to-physical address translation. 

• Cache Control. The R4000 primary instruction and data caches 

reside on-chip, and can each hold 8 Kbytes. In the R4400 
processor, the primary caches can each hold 16 Kbytes. 
Architecturally, each primary cache can be increased to hold up 
to 32 Kbytes. An off-chip secondary cache (R4000SC and 
R4000MC processors only) can hold from 128 Kbytes to 4 
Mbytes. All processor cache control logic, including the 
secondary cache control logic, is on-chip. 

• Floating-Point Unit. The FPU is located on-chip and 
implements the ANSI/IEEE standard 754-1985. 
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1.4 R4000 Processor Configurations 

The R4000 processor^ is packaged in three different configurations. All 
processors are implemented in sub-1 -micron CMOS technology. 

• R4000PC is designed for cost-sensitive systems such as 
inexpensive desktop systems and high-end embedded 
controllers. It is packaged in a 179-pin PGA, and does not 
support a secondary cache. 

• R4000SC is designed for high-performance uniprocessor 
systems. It is packaged in a 447-pin LGA/PGA and includes 
integrated control for large secondary caches built from 
standard SRAMs. 

• R4000MC is designed for large cache-coherent multiprocessor 
systems. It is packaged in a 447-pin LGA/PGA and, in addition 
to the features of R4000SC, includes support for a wide variety 
of bus designs and cache-coherency mechanisms. 

Table 1-1 lists the features in each of the three configurations (X indicates 
the feature is present). R4400 processor enhancements are described in the 
section following. 

1.5 R4400 Processor Enhancements 

In addition to the features contained in the R4000 processor, the R4400 
processor has the following enhancements: 

• fully functional Status pins (described in Chapter 10) 

• Master /Checker mode (described in Chapter 16) 

• larger primary caches (described in Processor General Features, 
in this chapter) 

• imcached store buffer (described in Chapter 3) 

• divide-by-6 and divide-by-8 modes (described in Chapter 10) 

• cache error bit, EW, added to the CacheErr register (described in 
Chapter 5). 



t Features of the R4400 processor that differ from the R4000 processor are noted throughout 
this book; for instance, R4400 processor enhancements are listed in the next section. 
Otherwise, references to the R4000 processor may be taken to include the R4400 processor. 
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Table 1-1 R4000 Features 



Feature 


R4000PC 


R4000SC 


R4000MC 


Primary Cache States 


Valid 


X 


X 


X 


Shared 






X 


Clean Exclusive 




X 


X 


Dirty Exclusive 


X 


X 


X 


Secondary Cache Interface 




X 


X 


Secondary Cache States 


VaUd 


X 


X 


X 


Shared 






X 


Dirty Shared 






X 


Clean Exclusive 




X 


X 


Dirty Exclusive 


X 


X 


X 


Multiprocessing 






X 


Cache Coherency Attributes 


Uncached 


X 


X 


X 


Noncoherent 


X 


X 


X 


Sharable 






X 


Update 






X 


Exclusive 






X 


Packages 


PGA (179-pin) 


X 






PGA (447-pin) 




X 


X 
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1.6 R4000 Processor 

This section describes the following: 

• the 64-bit architecture of the R4000 processor 

• the superpipeline design of the CPU instruction pipeline 

(described in detail in Chapter 3) 

• an overview of the System interface (described in detail in 
Chapter 12) 

• an overview of the CPU registers (detailed in Chapters 4 and 5) 
and CPU instruction set (detailed in Chapter 2 and Appendix 

A) 

• data formats and byte ordering 

• the System Control Coprocessor, CPO, and the floating-point 
unit, CPl 

• caches and memory, including a description of primary and 
secondary caches, the memory management unit (MMU), the 
translation lookaside buffer (TLB), and the Secondary Cache 
interface (described in more detail in Chapters 4 and 11). The 
Secondary Cache interface is detailed in Chapter 13. 

64-bit Architecture 

The natural mode of operation for the R4000 processor is as a 64-bit 
microprocessor; however, 32-bit applications maintain compatibility even 
when the processor operates as a 64-bit processor. 

The R4000 processor provides the following: 

• 64-bit on-chip floating-point unit (FPU) 

• 64-bit integer arithmetic logic imit (ALU) 

• 64-bit integer registers 

• 64-bit virtual address space 

• 64-bit system bus 

Figure 1-1 is a block diagram of the R4000 processor internals. 



MIPS R4000 Microprocessor User's Manual 



9 



Chapter 1 



64-bit System Bus 




S-cache 
Control 



Data Cache 



P-cache 
Control 



Instruction 
Cache 



CPO 



Exception/Control 
Registers 



Memory Management 
Registers 



Translation 

Lookaside 

Buffers 



CPU 



FPU 



CPU Registers 



ALU 



Load Aligner/Store Driver 



integer iVIuitiplier/Divider 



Address Unit 



PC Incrementer 



Pipeline Control 



FPU Registers 



Pipeline Bypass 



FP iVluitlplier 



FP Divider 



FP Add, Convert 
Square Root 



J 



Figure 1-1 R4000 Processor Internal Block Diagram 
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Superpipeline Architecture 

The R4000 processor exploits instruction parallelism by using an eight- 
stage superpipeline which places no restrictions on the instruction issued. 
Under normal circumstances, two instructions are issued each cycle. 

The internal pipeline of the R4000 processor operates at twice the 
frequency of the master clock, as discussed in Chapter 3. The processor 
achieves high throughput by pipelining cache accesses, shortening 
register access times, implementing virtual-indexed primary caches, and 
allowing the latency of functional units to span more than one pipeline 
clock cycles. 

System Interface 

The R4000 processor supports a 64-bit System interface that can construct 
imiprocessor systems with a direct DRAM interface — with or without a 
secondary cache — or cache-coherent multiprocessor systems. The System 
interface includes: 

• a 64-bit multiplexed address and data bus 

• 8 check bits 

• a 9-bit parity-protected command bus 

• 8 handshake signals 

The interface is capable of transferring data between the processor and 
memory at a peak rate of 400 Mbytes/ second, when running at 50 MHz. 
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CPU Register Overview 

The central processing iinit (CPU) provides the following registers: 

• 32 general purpose registers 

• a Program Counter (PC) register 

• 2 registers that hold the results of integer multiply and divide 
operations {HI and LO). 

Floating-point unit (FPU) registers are described in Chapter 6. 

CPU registers can be either 32 bits or 64 bits wide, depending on the R4000 
processor mode of operation. 

Figure 1-2 shows the CPU registers. 



63 



General Purpose Registers 

32 31 0 



rO 



r2 



r29 



r30 



r31 



Multiply and Divide Registers 



63 


32 31 


0 


i 


63 


32 31 


0 




63 


Program Counter 
32 31 


0 


PC 1 



Register width depends on mode of operation: 32-bit or 64-bit 

Figure 1-2 CPU Registers 
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Two of the CPU general purpose registers have assigned functions: 

• rO is hardwired to a value of zero, and can be used as the target 
register for any instruction whose result is to be discarded. rO 
can also be used as a source when a zero value is needed. 

• r31 is the link register used by Jump and Link instructions. It 
should not be used by other instructions. 

The CPU has three special purpose registers: 

• PC — Program Counter register 

• HI — Multiply and Divide register higher result 

• LO — Multiply and Divide register lower result 
The two Multiply and Divide registers {HI, LO) store: 

• the product of integer multiply operations, or 

• the quotient (in LO) and remainder (in HI) of integer divide 
operations 

The R4000 processor has no Program Status Word (PSW) register as such; 
this is covered by the Status and Cause registers incorporated within the 
System Control Coprocessor (CPO). CPO registers are described later in 
this chapter. 
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CPU Instruction Set Overview 

Each CPU instruction is 32 bits long. As shown in Figiire 1-3, there are 
three instruction formats: 

• immediate (I-type) 

• jump (J-type) 

• register (R-type) 



31 



26 25 21 20 16 15 



l-Type (Immediate) 



31 



J-Type (Jump) 



R-Type (Register) 



31 



op 


rs 


rt 


immediate | 


26 


25 




0 


op 


target | 


26 


25 21 


20 16 


15 11 


10 6 


5 0 


op 


rs 


rt 


rd 


sa 


funct 1 



Figure 1-3 CPU Instruction Formats 



Each format contains a number of different instructions, which are 
described further in this chapter. Fields of the instruction formats are 
described in Chapter 2. 

Instruction decoding is greatly simplified by limiting the niimber of 
formats to these three. This limitation means that the more complicated 
(and less frequently used) operations and addressing modes can be 
S5mthesized by the compiler, using sequences of these same simple 
instructions. 
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The instruction set can be further divided into the following groupings: 

• Load and Store instructions move data between memory and 
general registers. They are all immediate (I-type) instructions, 
since the only addressing mode supported is base register plus 
16-bit, signed immediate offset. 

• Computational instructions perform arithmetic, logical, shift, 
multiply, and divide operations on values in registers. They 
include register (R-type, in which both the operands and the 
result are stored in registers) and immediate (I-type, in which 
one operand is a 16-bit immediate value) formats. 

• Jump and Branch instructions change the control flow of a 

program. Jumps are always made to a paged, absolute address 
formed by combining a 26-bit target address with the high- 
order bits of the Program Counter (J-tjrpe format) or register 
address (R-type format). Branches have 16-bit offsets relative 
to the program counter (I-t5^e). Jump And Link instructions 
save their return address in register 31. 

• Coprocessor instructions perform operations in the 
coprocessors. Coprocessor load and store instructions are 
I-type. 

• Coprocessor 0 (system coprocessor) instructions perform 
operations on CPO registers to control the memory 
management and exception handling facilities of the processor. 
These are listed in Table 1-18. 

• Special instructions perform system calls and breakpoint 
operations. These instructions are always R-type. 

• Exception instructions cause a branch to the general exception- 
handling vector based upon the result of a comparison. These 
instructions occur in both R-type (both the operands and the 
result are registers) and I-type (one operand is a 16-bit 
immediate value) formats. 

Chapter 2 provides a more detailed summary and Appendix A gives a 
complete description of each instruction. 
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Tables 1-2 through 1-17 list CPU instructions common to MIPS R-Series 
processors, along with those instructions that are extensions to the 
instruction set architecture. The extensions result in code space 
reductions, multiprocessor support, and unproved performance in 
operating system kernel code sequences — for instance, in situations where 
run-time boimds-checking is frequently performed. Table 1-18 lists CPO 
instructions. 



Table 1-2 CPU Instruction Set: Load and Store Instructions 



OpCode 


Description 


LB 


Load Byte 


LBU 


Load Byte Unsigned 


LH 


Load Halfword 


LHU 


Load Halfword Unsigned 


LW 


Load Word 


LWL 


Load Word Left 


LWR 


Load Word Right 


SB 


Store Byte 


SH 


Store Halfword 


SW 


Store Word 


SWL 


Store Word Left 


SWR 


Store Word Right 


Table 1-3 CPU histriiction Set: Arithmetic Instructions (ALU Immediate) 


OpCode 


Description 


ADDI 


Add Immediate 


ADDIU 


Add Immediate Unsigned 


SLTI 


Set on Less Than Immediate 


SLTIU 


Set on Less Than Immediate Unsigned 


ANDI 


AND Immediate 


ORI 


OR Immediate 


XORI 


Exclusive OR Immediate 


LUl 


Load Upper Immediate 
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Table 1-4 CPU Instruction Set: Arithmetic (3-Operand, R-Type) 



OpCode 


Description 


ADD 


Add 


ADDU 


Add Unsigned 


SUB 


Subtract 


SUBU 


Subtract Unsigned 


SLT 


Set on Less Than 


SLTU 


Set on Less Than Unsigned 


AND 


AND 


OR 


OR 


XOR 


Exclusive OR 


NOR 


NOR 


Table 1-5 CPU Instruction Set: Multiply and Divide Instructions 


OpCode 


Description 


MULT 


Multiply 


MULTU 


Multiply Unsigned 


DIV 


Divide 


DIVU 


Divide Unsigned 


MFHl 


Mo\'e From HI 


MTHI 


Move To HI 


MFLO 


Move From LO 


MTLO 


Move To LO 
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Table 1-6 CPU Instruction Set: Jump and Branch Instructions 



OpCode 


Description 


J 


Jtimp 


JAL 


Jump And Link 


JR 


Jump Register 


JALR 


Jump And Link Register 


BEQ 


Branch on Equal 


BNE 


Branch on Not Equal 


BLEZ 


Branch on Less Than or Equal to Zero 


BGTZ 


Branch on Greater Than Zero 


BLTZ 


Branch on Less Than Zero 


BGEZ 


Branch on Greater Than or Equal to Zero 


BLTZAL 


Branch on Less Than Zero And Link 


BGEZAL 


Branch on Greater Than or Equal to Zero And Link 


Table 1-7 CPU Instruction Set: Shift Instructions 


OpCode 


Description 


SLL 


Shift Left Logical 


SRL 


Shift Right Logical 


SRA 


Shift Right Arithmetic 


SLLV 


Shift Left Logical Variable 


SRLV 


Shift Right Logical Variable 


SRAV 


Shift Right Arithmetic Variable 
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Table 1-8 CPU Instruction Set: Coprocessor Instructions 



OpCode 


Description 


LWCz 


Load Word to Coprocessor z 


SWCz 


Store Word from Coprocessor z 


MTCz 


Move To Coprocessor z 


MFCz 


Move From Coprocessor z 


CTCz 


Move Control to Coprocessor z 


CFCz 


Move Control From Coprocessor z 


COFz 


Coprocessor Operation z 


BCzT 


Branch on Coprocessor z True 


BCzF 


Branch on Coprocessor z False 


Table 1-9 CPU Instruction Set: Special Instructions 


OpCode 


Description 


SYSCALL 


System Call 


BREAK 


Break 
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Table 1-10 Extensions to the ISA: Load and Store Instructions 



OpCode 


Description 


LD 


Load Doubleword 


LDL 


Load Doubleword Left 


LDR 


Load Doubleword Right 


LL 


Load Linked 


LLD 


Load Linked Doubleword 


LWU 


Load Word Unsigned 


SC 


Store Conditional 


SCD 


Store Conditional Doubleword 


SD 


Store Doubleword 


SDL 


Store Doubleword Left 


SDR 


Store Doubleword Right 


SYNC 


Sync 


Table 1-11 Extensions to the ISA: Arithmetic Instructions (ALU Immediate) 


OpCode 


Description 


DADDI 


Doubleword Add Immediate 


DADDIU 


Doubleword Add Immediate Unsigned 


Table 1-12 Extensions to the ISA: Multiply and Divide Instructions 


OpCode 


Description 


DMULT 


Doubleword Multiply 


DMULTU 


Doubleword Multiply Unsigned 


DDIV 


Doubleword Divide 


DDIVU 


Doubleword Divide Unsigned 
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Table 1-13 Extensions to the ISA: Branch Instructions 



OpCode 


Description 


BEQL 


Branch on Equal Likely 


BNEL 


Branch on Not Equal Likely 


BLEZL 


Branch on Less Than or Equal to Zero Likely 


BGTZL 


Branch on Greater Than Zero Likely 


BLTZL 


Branch on Less Than Zero Likely 


BGEZL 


Branch on Greater Than or Equal to Zero Likely 


BLTZALL 


Branch on Less Than Zero And Link Likely 


BGEZALL 


Branch on Greater Than or Equal to Zero And Link 
Likely 


BCzTL 


Branch on Coprocessor z True Likely 


BCzFL 


Branch on Coprocessor z False Likely 



Table 1-14 Extensions to the ISA: Arithmetic Instructions (3-operand, R-type) 



OpCode 


Description 


DADD 


Doubleword Add 


DADDU 


Doubleword Add Unsigned 


DSUB 


Doubleword Subtract 


DSUBU 


Doubleword Subtract Unsigned 
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Table 1-15 Extensions to the ISA: Shift Instructions 



OpCode 


Description 


DSLL 


Doubleword Shift Left Logical 


DSRL 


Doubleword Shift Right Logical 


DSRA 


Doubleword Shift Right Arithmetic 


DSLLV 


Doubleword Shift Left Logical Variable 


DSRLV 


Doubleword Shift Right Logical Variable 


DSRAV 


Doubleword Shift Right Arithmetic Variable 


DSLL32 


Doubleword Shift Left Logical + 32 


DSRL32 


Doubleword Shift Right Logical + 32 


DSRA32 


Doubleword Shift Right Arithmetic + 32 


Table 1-16 Extensions to the ISA: Exception Instructions 


OpCode 


Description 


TGE 


Trap if Greater Than or Equal 


TGEU 


Trap if Greater Than or Equal Unsigned 


TLX 


Trap if Less Than 


TLTU 


Trap if Less Than Unsigned 


TEQ 


Trap if Equal 


TNE 


Trap if Not Equal 


TGEI 


Trap if Greater Than or Equal Immediate 


TGEIU 


Trap if Greater Than or Equal Immediate 

Unsigned 


TLTI 


Trap if Less Than Immediate 


TLTIU 


Trap if Less Than Immediate Unsigned 


TEQI 


Trap if Equal Immediate 


TNEl 


Trap if Not Equal Immediate 
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Table 1-17 Extensions to the ISA: Coprocessor Instructions 



OpCode 


Description 


DMFCz 


Doubleword Move From Coprocessor z 


DMTCz 


Doubleword Move To Coprocessor z 


LDCz 


Load Double Coprocessor z 


SDCz 


Store Double Coprocessor z 


Table 1-18 CPO Instructions 


OpCode 


Description 


DMFCO 


Doubleword Move From CPO 


DMTCO 


Doubleword Move To CPO 


MTCO 


Move to CPO 


MFCO 


Move from CPO 


TLBR 


Read Indexed TLB Entry 


TLBWI 


Write Indexed TLB Entry 


TLBWR 


Write Random TLB Entry 


TLBP 


Probe TLB for Matching Entry 


CACHE 


Cache Operation 


ERET 


Exception Return 
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Data Fonnats and Addressing 

The R4000 processor uses four data formats: a 64-bit doubleword, a 32-bit 
word, a 16-bit halfword, and an 8-bit byte. Byte ordering within each of 
the larger data formats — ^halfword, word, doubleword — can be 
configiired in either big-endian or little-endian order. Endianness refers 
to the location of byte 0 within the multi-byte data structure. Figures 1-4 
and 1-5 show the ordering of bytes within words and the ordering of 
words within multiple-word structures for the big-endian and little- 
endian conventions. 

When the R4000 processor is configured as a big-endian system, byte 0 is 
the most-significant (leftmost) byte, thereby providing compatibility with 
MC 68000® and IBM 370® conventions. Figure 1-4 shows this 
configuration. 



Higher Word 
Address Address 
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Address 



Bit|# 
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0 


1 


2 


3 



Figure 1-4 Big-Endian Byte Ordering 

When configured as a little-endian system, byte 0 is always the least- 
significant (rightmost) byte, which is compatible with iAPX® x86 and DEC 
VAX® conventions. Figure 1-5 shows this configuration. 
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Figure 1-5 Little-Endian Byte Ordering 
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In this text, bit 0 is always the least-significant (rightmost) bit; thus, bit 
designations are always little-endian (although no instructions explicitly 
designate bit positions within words). 

Figures 1-6 and 1-7 show little-endian and big-endian byte ordering in 
doublewords. 



Most-significant byte 



Least-significant byte 
Word 
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1 





Halfword 



Byte 



Bit#l7 654 32 1 

□□□□□□□□ 
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Figure 1-6 Little-Endian Data in a Doubleword 
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Figure 1-7 Big-Endian Data in a Doubleword 
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The CPU uses byte addressing for halfword, word, and doubleword 
accesses with the following alignment constraints: 

• Halfword accesses must be aligned on an even byte boundary 
(0, 2, 4...). 

• Word accesses must be aligned on a byte boundary divisible by 
four (0, 4, 8...). 

• Doubleword accesses must be aligned on a byte boundary 
divisible by eight (0, 8, 16...). 

The following special instructions load and store words that are not 
aligned on 4-byte (word) or 8-word (doubleword) boimdaries: 
LWL LWR SWL SWR 

LDL LDR SDL SDR 

These instructions are used in pairs to provide addressing of misaligned 
words. Addressing misaligned data incurs one additional instruction 
cycle over that required for addressing aligned data. 

Figures 1-8 and 1-9 show the access of a misaligned word that has byte 
address 3. 

Higher 

Address gj^ # 

I 

f\, fsi 24 23 1615 8~7 ol 

a ~ 4 II 5 II 6 II 
II 

Lower 
Address 

Figure 1-8 Big-Endian Misaligned Word Addressing 
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Figure 1-9 Little-Endian Misaligned Word Addressing 
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Coprocessors (CP0-CP2) 

The MIPS ISA defines three coprocessors (designated CPO through CP2): 

• Coprocessor 0 (CPO) is incorporated on the CPU chip and 
supports the virtual memory system and exception handling. 
CPO is also referred to as the System Control Coprocessor. 

• Coprocessor 1 (CPl) is reserved for the on-chip, floating-point 
coprocessor, the FPU. 

• Coprocessor 2 (CP2) is reserved for future definition by MIPS. 
CPO and CPl are described in the sections that foUow. 



System Control Coprocessor, CPO 

CPO translates virtual addresses into physical addresses and manages 
exceptions and transitions between kernel, supervisor, and user states. 
CPO also controls the cache subsystem, as well as providing diagnostic 
control and error recovery facilities. 

The CPO registers shown in Figure 1-10 and described in Table 1-19 
manipulate the memory management and exception handling capabilities 
of the CPU. 
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Table 1-19 System Control Coprocessor (CPO) Register Definitions 



Number 


Register 


Description 


0 


Index 


Programmable pointer into TLB array 


1 


Random 


Pseudorandom pointer into TLB array (read only) 


2 


EntryLoO 


Low half of TLB entry for even virtual address (VPN) 


3 


EntryLol 


Low half of TLB entry for odd virtual address (VPN) 


4 




Pointer to kernel virtual page table entry (PTE) in 32-bit 
addressing mode 


5 


PageMask 


TLB Page Mask 


6 


Wired 


Number of wired TLB entries 


7 


— 


Reserved 


8 


BadVAddr 


Bad virtual address 


9 


Count 


Timer Count 


10 


Entry Hi 


High half of TLB entry 


11 


Compare 


Timer Compare 


12 


SR 


Status register 


13 


Cause 


Cause of last exception 


14 


EPC 


Exception Program Counter 


15 


PRId 


Processor Revision Identifier 


16 


Config 


Configuration register 


17 


LLAddr 


Load Linked Address 


18 


WatchLo 


Memory reference trap address low bits 


19 


WatchHi 


Memory reference trap address high bits 


20 


XContext 


Pointer to kernel virtual PTE table in 64-bit addressing mode 


21-25 




Reserved 


26 


ECC 


Secondary-cache error checking and correcting (ECC) and 
Primary parity 


27 


CacheErr 


Cache Error and Status register 


28 


TagLo 


Cache Tag register 


29 


TagHi 


Cache Tag register 


30 


ErrorEPC 


Error Exception Program Counter 


31 




Reserved 
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Floating-Point Unit (FPU), CPl 

The MIPS floating-point unit (FPU) is designated CPl; the FPU extends 
the CPU instruction set to perform arithmetic operations on floating-point 
values. The FPU, with associated system software, fully conforms to the 
requirements of ANSI/IEEE Standard 754^1985, IEEE Standard for Binary 
Floating-Point Arithmetic. 

The FPU features include: 

• Full 64-bit Operation. The FPU can contain either 16 or 32 
64-bit registers to hold single-precision or double-precision 
values. The FPU also includes a 32-bit Status/Control register 
that provides access to all IEEE-Standard exception handling 
capabilities. 

• Load and Store Instruction Set. Like the CPU, the FPU uses a 
load- and store-based instruction set. Floating-point operations 
are started in a single cycle and their execution overlaps other 
fixed-point or floating-point operations. 

• Tightly-coupled Coprocessor Interface. The FPU is on the 
CPU chip, and appears to the programmer as a simple 

extension of the CPU (accessed as CPl). Together, the CPU and 
FPU form a tightly-coupled unit with a seamless integration of 
floating-point and fixed-point instruction sets. Since each imit 
receives and executes instructions in parallel, some floating- 
point instructions can execute at the same rate (two 
instructions per cycle) as fixed-point instructions. 
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Memory Management System (MMU) 

The R4000 processor has a 36-bit physical addressing range of 64 Gbytes. 
However, since it is rare for systems to implement a physical memory 
space this large, the CPU provides a logical expansion of memory space by 
translating addresses composed in the large virtual address space into 
available physical memory addresses. The R4000 processor supports the 
following two addressing modes: 

• 32-bit mode, in which the virtual address space is divided into 
2 Gbytes per user process and 2 Gbytes for the kernel. 

• 64-bit mode, in which the virtual address is expanded to 
1 Tbyte (2^° bytes) of user virtual address space. 

A detailed description of these address spaces is given in Chapter 4. 

The Translation Lookaside Buffer (TLB) 

Virtual memory mapping is assisted by a translation lookaside buffer, 
which caches virtual-to-physical address translations. This fully- 
associative, on-chip TLB contains 48 entries, each of which maps a pair of 
variable-sized pages ranging from 4 Kbytes to 16 Mbytes, in multiples of 
four. 

Instruction TLB 

The R4000 processor has a two-entry instruction TLB (ITLB) which assists 
in instruction address translation. The ITLB is completely invisible to 
software and exists only to increase performance. 

Joint TLB 

An address translation value is tagged with the most-significant bits of its 
virtual address (the nimiber of these bits depends upon the size of the 
page) and a per-process identifier. If there is no matching entry in the TLB, 
an exception is taken and software refills the on-chip TLB from a page 
table resident in memory; this TLB is referred to as the joint TLB (JTLB) 
because it contains both data and instructions jointly. The JTLB entry to 
be rewritten is selected at random. 
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Operating Modes 

The R4000 processor has three operating modes: 

• User mode 

• Supervisor mode 

• Kernel mode 

The manner in which memory addresses are translated or mapped depends 
on the operating mode of the CPU; this is described in Chapter 4. 

Cache Memory Hierarchy 

To achieve a high performance in tmiprocessor and multiprocessor 
systems, the R4000 processor supports a two-level cache memory 
hierarchy that increases memory access bandwidth and reduces the 
latency of load and store instructions. This hierarchy consists of on-chip 
instruction and data caches, together with an optional external secondary 
cache that varies in size from 128 Kbytes to 4 Mbytes. 

The secondary cache is assumed to consist of one bank of industry- 
standard static RAM (SRAM) with output enables, arranged as a 
quadword (128-bit) data array, with a 25-bit-wide tag array. Check fields 
are added to both data and tag arrays to improve data integrity. 

The secondary cache can be configured as a joint cache, or split into 
separate instruction and data caches. The maximimi secondary cache size 
is 4 Mbytes; the minimirm secondary cache size is 128 Kbytes for a joint 
cache, or 256 Kbytes total for split instruction/ data caches. The secondary 
cache is direct mapped, and is addressed with the lower part of the 
physical address. 

Primary and secondary caches are described in more detail in Chapter 11. 
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Primary Caches 

The R4000 processor incorporates separate on-chip primary instruction 
and data caches to fill the high-performance pipeline. Each cache has its 
own 64-bit data path, and each can be accessed in parallel. 

The R4000 processor primary caches hold from 8 Kbytes to 32 Kbytes; the 
R4400 processor primary caches are fixed at 16 Kbytes. 

Cache accesses can occur up to twice each cycle. This provides the integer 
and floating-point units with an aggregate bandwidth of 1.6 Gbytes per 
second at a MasterClock frequency of 50 MHz. 

Secondary Cache Interface 

The R4000SC (secondary cache) and R4000MC (multiprocessor) versions 
of the processor allow connection to an optional secondary cache. These 
processors provide all of the secondary cache control circuitry, including 
error checking and correcting (ECC) protection, on chip. 

The Secondary Cache interface includes: 

• a 128-bit data bus 

• a 25-bit tag bus 

• an 18-bit address bus 

• SRAM control signals 

The 128-bit-wide data bus is designed to minimize cache miss penalties, 
and allow the use of standard low-cost SRAM in secondary cache. 
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2 



This chapter is an overview of the central processing unit (CPU) 
instruction set; refer to Appendix A for detailed descriptions of individual 
CPU instructions. 

An overview of the floating-point unit (FPU) instruction set is in 
Chapter 6; refer to Appendix B for detailed descriptions of individual FPU 
instructions. 
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2.1 CPU Instruction Formats 

Each CPU instruction consists of a single 32-bit word, aligned on a word 
boundary. There are three instruction formats — immediate (I-t5^e), jump 
(J-type), and register (R-type) — as shown in Figure 2-1. The use of a small 
number of instruction formats simplifies instruction decoding, allowing 
the compiler to S5mthesize more complicated (and less frequently used) 
operations and addressing modes from these three formats as needed. 



I-Type (Immediate) 

31 26 25 21 20 16 15 



I op 



rs 



rt 



immediate 



I 



J-Type (Jump) 

31 26 25 



I op 



jarg 



et 



1 



R-Type (Register) 



31 



I op 



26 25 21 20 16 15 



1110 



6 5 



rs 



rt 



rd 



sa 



op 6-bit operation code 

rs 5-bit source register specifier 

^ 5-bit target (source/destination) register or branch 

condition 

immediate 1 6-bit immediate value, branch displacement or 
address displacement 

target 26-bit jump target address 

rd 5-bit destination register specifier 

sa 5-bit shift amount 

funct 6-bit function field 



Figure 2-1 CPU Instruction Formats 

In the MIPS architecture, coprocessor instructions are implementation- 
dependent; see Appendix A for details of individual Coprocessor 0 
instructions. 
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Load and Store Instructions 

Load and store are immediate (I-t5^e) instructions that move data 
between memory and the general registers. The only addressing mode 
that load and store instructions directly support is base register plus 16-bit 
signed immediate offset. 

Scheduling a Load Delay Slot 

A load instruction that does not allow its result to be used by the 
instruction immediately following is called a delayed load instruction. The 
instruction slot immediately following this delayed load instruction is 
referred to as the load delay slot. 

In the R4000 processor, the instruction immediately following a load 
instruction can use the contents of the loaded register, however in such 
cases hardware interlocks insert additional real cycles. Consequently, 
scheduling load delay slots can be desirable, both for performance and 
R-Series processor compatibility. However, the scheduling of load delay 
slots is not absolutely required. 

Defining Access Types 

Access type indicates the size of an R4000 processor data item to be loaded 
or stored, set by the load or store instruction opcode. Access types are 
defined in Appendix A. 

Regardless of access type or byte ordering (endianness), the address given 
specifies the low-order byte in the addressed field. For a big-endian 
configuration, the low-order byte is the most-significant byte; for a little- 
endian configuration, the low-order byte is the least-significant byte.^ 

The access type, together with the three low-order bits of the address, 
define the bytes accessed within the addressed doubleword (shown in 
Table 2-1). Only the combinations shown in Table 2-1 are permissible; 
other combinations cause address error exceptions. See Appendix A for 
individual descriptions of CPU load and store instructions. 



t Data formats are described in Chapter 1. 
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Table 2-1 Byte Access within a Doubleword 



Access Type 
Mnemonic 
(Value) 


Low Order 


Bytes Accessed 


2 


aare 
Bits 

1 


ss 
0 


Big endian 

(63 31 0) 

Byte 


Little endian 

(63 31 0) 

Byte 


Doubleword (7) 


0 


0 


0 


0 


1 


2 


3 


4 


5 


6 


7 


7 


6 


5 


4 


3 


2 


1 


0 


Septibyte (6) 


0 


0 


0 


0 


1 


2 


3 


4 


5 


6 






6 


5 


4 


3 


2 


1 


0 


0 


0 


1 




1 


2 


3 


4 


5 


6 


7 


7 


6 


5 


4 


3 


2 


1 




Sextibyte (5) 


0 


0 


0 


0 


1 


2 


3 


4 


5 










5 


4 


3 


2 


1 


0 


0 


1 


0 






2 


3 


4 


5 


6 


7 


7 


6 


5 


4 


3 


2 






Quintibyte (4) 


0 


0 


0 


0 


1 


2 


3 


4 














4 


3 


2 


1 




0 


1 


1 




1 


2 


3 


4 


5 


6 


7 


7 


6 


5 


4 




■ 

2 


■1 

1 


m 


Word (3) 


0 


0 


0 


0 


3 


















3 


0 


1 


0 


0 


0 






M 


4 


5 


6 


7 


7 


6 


5 


4 








0 


Triplebyte (2) 


0 


0 


0 


1 


2 






















2 


1 


0 


0 


1 




1 


2 


3 


















3 


2 


1 




1 


0 


0 










4 


5 


6 






6 


5 


4 










1 


0 


1 












5 


6 


7 


7 


6 


5 












Halfword (1) 


0 


0 


0 


0 


1 


























1 


0 


0 


1 


0 






2 


3 


















3 


2 






1 


0 


0 










4 


5 










5 


4 










1 


1 


0 


i 












6 


7 


7 


6 














Byte (0) 


0 


0 


0 


0 






























0 


0 


0 


1 




1 


























1 




0 


1 


0 






2 






















2 






0 


1 


1 








3 


















3 








1 


0 


0 










4 














4 










1 


0 


1 












5 










5 












1 


1 


0 














6 






6 














1 


1 


1 ■ 














7 


7 
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Computational Instructions 

Computational instructions can be either in register (R-t5^e) format, in 
which both operands are registers, or in immediate (I-type) format, in 
which one operand is a 16-bit immediate. 

Computational instructions perform the following operations on register 
values: 

• arithmetic 

• logical 

• shift 

• multiply 

• divide 

These operations fit in the following four categories of computational 
instructions: 

• ALU Immediate instructions 

• three-Operand Register-Tjrpe instructions 

• shift instructions 

• multiply and divide instructions 

64-bit Operations 

When operating in 64-bit mode, 32-bit operands must be sign extended. 
The result of operations that use incorrect sign-extended 32-bit values is 
unpredictable. 
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Cycle Timing for Multiply and Divide Instructions 

Any multiply instruction in the integer pipeline is transferred to the 
multiplier as remaining instructions continue through the pipeline; the 
product of the multiply instruction is saved in the HI and LO registers. 

If the multiply instruction is followed by an MFHI or MFLO before the 
product is available, the pipeline interlocks imtil this product does become 
available. 

Table 2-2 gives the execution time for integer multiply and divide 
operations. The "Total Cycles" column gives the total number of cycles 
required to execute the instruction. The "Overlap" column gives the 
number of cycles that overlap other CPU operations; that is, the number of 
cycles required between the present instruction and a subsequent MFHI or 
MFLO without incurring an interlock. If this value is zero, the operation 
is not performed in parallel with any other CPU operation. 



Table 2-2 Multiply /Divide Instruction Cycle Timing 



Instruction 


Total Cycles 


Overlap 


MULT 


12 


10 


MULTU 


12 


10 


DIV 


75 


0 


DIVU 


75 


0 


DMULT 


20 


18 


DMULTU 


20 


18 


DDIV 


139 


0 


DDIVU 


139 


0 



For more information about computational instructions, refer to the 
individual instruction as described in Appendix A. 
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Jump and Branch Instructions 

Jump and branch instructions change the control flow of a program. All 
jump and branch instructions occur with a delay of one instruction: that is, 
the instruction immediately following the jump or branch (this is known 
as the instruction in the delay slot) always executes while the target 
instruction is being fetched from storage.^ 

Overview of Jump Instructions 

Subroutine calls in high-level languages are usually implemented with 
Jump or Jiraip and Link instructions, both of which are J-type instructions. 
In J-t5^e format, the 26-bit target address shifts left 2 bits and combines 
with the high-order 4 bits of the current program coimter to form an 
absolute address. 

RetujTis, dispatches, and large cross-page jumps are usually implemented 
with the Jump Register or Jump and Link Register instructions. Both are 
R-t5^e instructions that take the 32-bit or 64-bit byte address contained in 
one of the general purpose registers. 

For more information about jump instructions, refer to the individual 
instruction as described in Appendix A. 

Overview of Branch Instructions 

All branch instruction target addresses are computed by adding the 
address of the instruction in the delay slot to the 16-bit ojfset (shifted left 
2 bits and sign-extended to 32 bits). All branches occur with a delay of one 
instruction. 

If a conditional branch Ukely is not taken, the instruction in the delay slot 
is nullified. 

For more information about branch instructions, refer to the individual 
instruction as described in Appendix A. 



t Taken branches have a 3 cycle penalty in this implementation. See Chapter 3 for more 
information. 
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Special Instructions 

Special instructions allow the software to initiate traps; they are always 
R-type. For more information about special instructions, refer to the 
individual instruction as described in Appendix A. 



Exception Instructions 

Exception instructions are extensions to the MIPS ISA. For more 
information about exception instructions, refer to the individual 
instruction as described in Appendix A. 



Coprocessor Instructions 

Coprocessor instructions perform operations in their respective 
coprocessors. Coprocessor loads and stores are I-type, and coprocessor 
computational instructions have coprocessor-dependent formats. 

Individual coprocessor instructions are described in Appendices A (for 
CPO) and B (for the FPU, CPl). 

CPO instructions perform operations specifically on the System Control 
Coprocessor registers to manipulate the memory management and 
exception handling facilities of the processor. Appendix A details CPO 
instructions. 
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This chapter describes the basic operation of the CPU pipeline, which 
includes descriptions of the delay instructions (instructions that follow a 
branch or load instruction in the pipeline), interruptions to the pipeline 
flow caused by interlocks and exceptions, and R4400 implementation of an 

uncached store buffer. 

The FPU pipeline is described in Chapter 6. 
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3.1 CPU Pipeline Operation 

The CPU has an eight-stage instruction pipeline; each stage takes one 
PCycle (one cycle of PClock, which runs at twice the frequency of 
MasterClock). Thus, the execution of each instruction takes at least eight 
PCycles (four MasterClock cycles). An instruction can take longer — for 
example, if the required data is not in the cache, the data must be retrieved 
from main memory. 

Once the pipeline has been filled, eight instructions are executed 
simultaneously. Figure 3-1 shows the eight stages of the instruction 
pipeline; the next section describes the pipeline stages. 



MasterClock 
Cycle 



PCycle 



(8-Deep) 



IF 



IF IS RF EX DF DS TC 



IF 



IS RF EX DF DS 



IS RF EX DF 



IF IS RF EX 



IF IS RF 



IF IS 



IF 



WB 



TC 



DS 



DF 



EX 



RF 



IS 



IF 



WB 



TC WB 



DS TC WB 



DF DS TC WB 



EX DF DS TC WB 



RF EX DF DS TC WB 



IS RF EX DF DS TC WB 



Current 
CPU 
Cycle 



Figure 3-1 Instruction Pipeline Stages 
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3.2 CPU Pipeline Stages 

This section describes each of the eight pipeline stages: 

• IF - Instruction Fetch, First Half 

• IS - Instruction Fetch, Second Half 

• RF - Register Fetch 

• EX - Execution 

• DF - Data Fetch, First Half 

• DS - Data Fetch, Second Half 

• TC - Tag Check 

• WB - Write Back 

IF - Instruction Fetch, First Half 

During the IF stage, the following occurs: 

• Branch logic selects an instruction address and the instruction 
cache fetch begins. 

• The instruction translation lookaside buffer (ITLB) begins the 
virtual-to-physical address translation. 

IS - Instruction Fetch, Second Half 

During the IS stage, the instruction cache fetch and the virtual-to-physical 
address translation are completed. 

RF - Register Fetch 

During the RF stage, the following occurs: 

• The instruction decoder (IDEC) decodes the instruction and 
checks for interlock conditions. 

• The instruction cache tag is checked against the page frame 
number obtained from the ITLB. 

• Any required operands are fetched from the register fUe. 
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EX - Execution 

During the EX stage, one of the following occurs: 

• The arithmetic logic unit (ALU) performs the arithmetic or 
logical operation for register-to-register instructions. 

• The ALU calculates the data virtual address for load and store 
instructions. 

• The ALU determines whether the branch condition is true and 
calculates the virtual branch target address for branch 
instructions. 

DF - Data Fetch, First Half 

During the DF stage, one of the following occurs: 

• The data cache fetch and the data virtual-to-physical 
translation begins for load and store instructions. 

• The branch instruction address translation and translation 
lookaside buffer (TLB)^ update begins for branch instructions. 

• No operations are performed during the DF, DS, and TC stages 
for register-to-register instructions. 

DS - Data Fetch, Second Half 

During the DS stage, one of the following occurs: 

• The data cache fetch and data virtual-to-physical translation 
are completed for load and store instructions. The Shifter 
aligns data to its word or doubleword boundary. 

• The branch instruction address translation and TLB update are 
completed for branch instructions. 

TC - Tag Check 

For load and store instructions, the cache performs the tag check during 
the TC stage. The physical address from the TLB is checked against the 
cache tag to determine if there is a hit or a miss. 
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t The TLB is described in Chapter 4. 
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WB - Write Back 

For register-to-register instructions, the instruction result is written back 
to the register file during the WB stage. Branch instructions perform no 
operation during this stage. 

Figure 3-2 shows the activities occurring during each ALU pipeline stage, 
for load, store, and branch instructions. 
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Figure 3-2 CPU Pipeline Activities 
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3.3 Branch Delay 



The CPU pipeline has a branch delay of three cycles and a load delay of 
two cycles. The three-cycle branch delay is a result of the branch 
comparison logic operating during the EX pipeline stage of the branch, 
producing an instruction address that is available in the IF stage, four 
instructions later. 

Figure 3-3 illustrates the branch delay. 
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Figure 3-3 CPU Pipeline Branch Delay 



3.4 Load Delay 

The completion of a load at the end of the DS pipeline stage produces an 
operand that is available for the EX pipeline stage of the third subsequent 
instruction. 

Figure 3-4 shows the load delay of two pipeline stages. 
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Figure 3-4 CPU Pipeline Load Delay 
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3.5 Interlock and Exception Handling 

Smooth pipeline flow is interrupted when cache misses or exceptions 
occur, or when data dependencies are detected. Interruptions handled 
using hardware, such as cache misses, are referred to as interlocks, while 
those that are handled using software are called exceptions. 

As shown in Figure 3-5, all interlock and exception conditions are 
collectively referred to as faults. 



Software 



Exceptions 



Faults 



Hardware 



Interlocks 




Figure 3-5 Interlocks, Exceptions, and Faults 

There are two types of interlocks: 

• stalls, which are resolved by halting the pipeline 

• slips, which require one part of the pipeline to advance while 
another part of the pipeline is held static 

At each cycle, exception and interlock conditions are checked for all active 
instructions. 

Because each exception or interlock condition corresponds to a particular 
pipeline stage, a condition can be traced back to the particular instruction 
in the exception /interlock stage, as shown in Figure 3-6. For instance, an 
Illegal Instruction (11) exception is raised in the execution (EX) stage. 

Tables 3-1 and 3-2 describe the pipeline interlocks and exceptions listed in 
Figure 3-6. 
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Figure 3-6 Correspondence of Pipeline Stage to Interlock Condition 
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Table 3-1 Pipeline Exceptions 



Exception 


Description 


ITLB 


Instruction Translation or Address Exception 


Intr 


External Interrupt 


IBE 


IBus Error 


IVACoh 


IVA Coherent 


II 


Illegal Instruction 


BP 


Breakpoint 


SC 


System Call 


CUn 


Coprocessor Unusable 


lECCErr 


Instruction ECC Error 


OVF 


Integer Overflow 


FPE 


FP Interrupt 


ExTrap 


EX Stage Traps 


DTLB 


Data Translation or Address Exception 


TLBMod 


TLB Modified 


DBE 


Data Bus Error 


Watch 


Memory Reference Address Compare 


DVACoh 


DVA Coherent 


DECCEiT 


Data ECC Error 


NMI 


Non-maskable Interrupt 


Reset 


Reset 
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Table 3-2 Pipeline Interlocks 



Interlock 


Description 


ITM 


Instruction TLB Miss 


ICM 


Instruction Cache Miss 


CPBE 


Coprocessor Possible Exception 


SXT 


Integer Sign Extend 


STI 


Store Interlock 


DCM 


Data Cache Miss 


WA 


Watch Address Exception 


LDI 


Load Interlock 


MultB 


Multiply Unit Busy 


DivB 


Divide Unit Busy 


MDOne 


Mult/Div One Cycle Slip 


ShSlip 


Var Shift or Shift > 32 bits 


FCBsy 


FP Busy 



Exception Conditions 

When an exception condition occurs, the relevant instruction and all those 
that follow it in the pipeline are cancelled. Accordingly, any stall 

conditions and any later exception conditions that may have referenced 
this instruction are inhibited; there is no benefit in servicing stalls for a 
cancelled instruction. 

After instruction cancellation, a new instruction stream begins, starting 
execution at a predefined exception vector. System Control Coprocessor 
registers are loaded with information that identifies the type of exception 
and auxiliary information such as the virtual address at which translation 
exceptions occur. 
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Stall Conditions 

Often, a stall condition is only detected after parts of the pipeline have 
advanced using incorrect data; this is called a pipeline overrun. When a stall 
condition is detected, all eight instructions — each different stage of the 
pipeline — are frozen at once. In this stalled state, no pipeline stages can 
advance until the interlock condition is resolved. 

Once the interlock is removed, the restart sequence begins two cycles 
before the pipeline resumes execution. The restart sequence reverses the 
pipeline overrun by inserting the correct information into the pipeline. 



Slip Conditions 

When a slip condition is detected, pipeline stages that must advance to 

resolve the dependency continue to be retired (completed), while 
dependent stages are held until the required data is available. 



External Stalls 

External stall is another class of interlocks. An external stall originates 
outside the processor and is not referenced to a particular pipeline stage. 
This interlock is not affected by exceptions. 



Interlock and Exception Timing 

To prevent interlock and exception handling from adversely affecting the 
processor cycle time, the R4000 processor uses both logic and circuit 
pipeline techniques to reduce critical timing paths. Interlock and 
exception handling have the following effects on the pipeline: 

• In some cases, the processor pipeline must be backed up 
(reversed and started over again from a prior stage) to recover 
from interlocks. 

• In some cases, interlocks are serviced for instructions that will 
be aborted, due to an exception. 

These two cases are discussed below. 
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Backing Up the Pipeline 

An example of pipeline back-up occurs in a data cache miss, in which the 
late detection of the miss causes a subsequent instruction to compute an 
incorrect result. 

When this occurs, not only must the cache miss be serviced but the EX 
stage of the dependent instruction must be re-executed before the pipeline 
can be restarted. Figure 3-7 illustrates this procedure; a minus (-) after 
the pipeline stage descriptor (for instance, EX-) indicates the operation 
produced an incorrect result, while a plus (+) indicates the successful 
re-execution of that operation. 



Run 


Run 


Run 


Run 


Run 


Run 


Run 


StI 


StI 


StI 


StI 


StI 


Run 


Run 


Run 


Run 


Run 
























Rst2 


Rstl 
















IF 


IS 


RF 


EX 


DF 


DS 


TC 






DF 


DS 


TC 


WB 














IF 


IS 


RF 


EX 


DF 


DS 








DF 


DS 


TC 


WB 
















IF 


IS 


RF 


EX 


DF 










DF 


DS 


TC 


WB 
















IF 


IS 


RF 


EX- 








RF 


EX+ 


DF 


DS 


TC 


WB 














IF 


IS 


RF 












EX 


DF 


DS 


TC 


WB 



Figure 3-7 Pipeline Overrun 
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Aborting an Instruction Subsequent to an Interlock 

The interaction between an integer overflow and an instruction cache miss 
is an example of an interlock being serviced for an instruction that is 
subsequently aborted. 

In this case, pipelining the overflow exception handling into the DF stage 
allows an instruction cache miss to occur on the next iiranediate 
instruction. Figure 3-8 illustrates this; aborted instructions are indicated 
with an asterisk (*). 
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Figure 3-8 Instruction Cache Miss 

Even though the line brought in by the instruction cache could have been 
replaced by a line of the exception handler, no performance loss occurs, 
since the instruction cache miss would have been serviced an5way, after 
returning from the exception handler. HandUng of the exception is done 
in this fashion because the frequency of an exception occurring is, by 
definition, relatively low. 
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Pipelining the Exception Handling 

Pipelining of interlock and exception handling is done by pipelining the 
logical resolution of possible fault conditions with the buffering and 
distiibuting of the pipeline control signals. 

In particular, a half clock period is provided for buffering and distributing 
the run control signal; during this time the logic evaluation to produce run 
for the next cycle begins. Figure 3-9 shows this process for a sequence of 
loads. 



cock ^^^^^^^^^^^ 
Phase |i|2|i|2|i|2|i|2|i|2|i| 
Load1: I DF I DS I fc I WB I 



|TagCk| Resolve [Buffer | 
Load2: I DF I DS | TC | VVB" 



I TagCk | Resolve | Buffer | 
Load3: I DF | DS | TC | Wg l 



I TagCk | Resolve | Buffer | 

Figure 3-9 Pipelining of Interlock and Exception Handling 
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The decision whether or not to advance the pipeline is derived from these 
three rules: 

• All possible fault-causing events, such as cache misses, 
translation exceptions, load interlocks, etc., must be 
individually evaluated. 

• The fault to be serviced is selected, based on a predefined 
priority as determined by the pipeline stage of the asserted 
faults. 

• Pipeline advance control signals are buffered and distributed. 
Figure 3-10 illustrates this process. 



Clock ^ V_ 

Phase |1 |2|l|2|l|2|l|2 



Cycle I Run I Run | Run | Run 



Evaluate Resolve Buffer 



Evaluate I Resolve Buffer 



Evaluate I Resolve Buffer 



Figure 3-10 Pipeline Advance Decision 
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Special Cases 

In some instances, the pipeline control state machine is b5^assed. This 
occurs due to performance considerations or to correctness 
considerations, which are described in the following sections. 



Perfonnance Considerations 

A performance consideration occurs when there is a cache load miss. By 
bypassing the pipeline state machine, it is possible to eliminate up to two 
cycles of load miss latency. Two techniques, address acceleration and 
address prediction, increase performance. 



Address Acceleration 

Address acceleration bypasses a potential cache miss address. It is relatively 
straightforward to perform this bj^ass since sending the cache miss 
address to the secondary cache has no negative impact even if a 
subsequent exception nullifies the effect of this cache access. Power is 
wasted when the miss is inhibited by some fault, but this is a minor effect. 



Address Prediction 

Another technique used to reduce miss latency is the automatic increment 
and transmission of instruction miss addresses following an instruction 
cache miss. This form of latency reduction is called address prediction: the 
subsequent instruction miss address is predicted to be a simple increment 
of the previous miss address. Figure 3-11 shows a cache miss in which the 
cache miss address is changed based on the detection of the miss. 
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Figure 3-11 Load Address Bypassing 

Correctness Considerations 

An example in which bj^assing is necessary to guarantee correctness is a 
cache write. 
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3.6 R4400 Processor Uncached Store Buffer 

The R4400 processor contains an uncached store buffer to improve the 
performance of uncached stores over that available from an R4000 
processor. When an imcached store reaches the write-back (WB) stage in 
the CPU pipeline, the CPU must stall imtil the store is sent off-chip. In the 
R4400 processor, a single-entry buffer stores this imcached WB-stage data 
on the chip without stalling the pipeline. 

If a second uncached store reaches the WB stage in the R4400 processor 
before the first uncached store has been moved off -chip, the CPU stalls 
vintil the store buffer completes the first vincached store. To avoid this 
stall, the compiler can insert seven instruction cycles between the two 
uncached stores, as shown in Figure 3-12. A single instruction that 
requires seven cycles to complete could be used in place of the seven No 
Operation (NOP) instructions. 



SW R2, (r3) 


# uncached store 


NOP 


#NOP 1 


NOP 


#N0P2 


NOP 


#N0P3 


NOP 


#N0P4 


NOP 


#N0P5 


NOP 


#N0P6 


NOP 


#N0P7 


SW R2, (R3) 


# uncached store 



Figure 3-12 Pipeline Sequence for Back-to-Back Uncached Stores 



If the two uncached stores execute within a loop, the two killed 
instructions which are part of the loop branch latency are included in the 
coimt of seven interpolated cycles. Figure 3-13 shows the four NOP 
instructions that need to be scheduled in this case. 
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Loop: SW R2, (R3) 


# uncached store 


NOP 




NOP 




NOP 




B Loop 


# branch to loop 


NOP 




killed 


# branch latency 


killed 


# branch latency 



Figure 3-13 Back-to-Back Uncached Stores in a Loop 



The timing requirements of the System interface govern the latency 

between uncached stores; back-to-back stores can be sent across the 
interface at a maximum rate of one store for every four external cycles. If 
the R4400 processor is programmed to rtin in divide-by-2 mode (for more 
information about divided clock, see the description of SClock in Chapter 
10), an uncached store can occur every eight pipeline cycles. If a larger 
clock divisor is used, more pipeline cycles are required for each store. 

CAUTION: The R4000 processor always had a strongly-ordered 
execution; however, with the addition of the imcached store buffer in 

the R4400 there is a potential for out-of-order execution (described in 
the section of the same name in Chapter 11, and Uncached Loads or 
Stores in Chapter 12). 
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The MIPS R4000 processor provides a full-featured memory management 
unit (MMU) which uses an on-chip translation lookaside buffer (TLB) to 
translate virtual addresses into physical addresses. 

This chapter describes the processor virtual and physical address spaces, 
the virtual-to-physical address translation, the operation of the TLB in 
making these translations, and those System Control Coprocessor (CPO) 
registers that provide the software interface to the TLB. 
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4.1 Translation Lookaside Buffer (TLB) 

Mapped virtual addresses are translated into physical addresses using an 
on-chip TLB.^ The TLB is a fully associative memory that holds 48 entries, 
which provide mapping to 48 odd /even page pairs (96 pages). When 
address mapping is indicated, each TLB entry is checked simultaneously 
for a match with the virtual address that is extended with an ASID stored 
in the Entrylii register. 

The address mapped to a page ranges in size from 4 Kbytes to 16 Mbytes, 
in multiples of 4— that is, 4K, 16K, 64K, 256K, IM, 4M, 16M. 



Hits and Misses 

If there is a virtual address match, or hit, in the TLB, the physical page 
number is extracted from the TLB and concatenated with the offset to form 
the physical address (see Figure 4-1). 

If no match occurs (TLB miss), an exception is taken and software refills 
the TLB from the page table resident in memory. Software can write over 
a selected TLB entry or use a hardware mechanism to write into a random 
entry. 



Multiple Matches 

If more than one entry in the TLB matches the virtual address being 
translated, the operation is imdefined. To prevent permanent damage to 
the part, the TLB may be disabled if more than several entries match. The 
TLB-Shutdown (TS) bit in the Status register is set to 1 if the TLB is 
disabled. 



t There are virtual-to-physical address translations that occur outside of the TLB. For 

example, addresses in the ksegO and ksegl spaces are unmapped translations. In these 
spaces the physical address is derived by subtracting the base address of the space from 
the virtual address. 
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4.2 Address Spaces 

This section describes the virtual and physical address spaces and the 
manner in which virtual addresses are converted or "translated" into 
physical addresses in the TLB. 

Virtual Address Space 

The processor virtual address can be either 32 or 64 bits wide/ depending 
on whether the processor is operating in 32-bit or 64-bit mode. 

• In 32-bit mode, addresses are 32 bits wide. The maximum user 

process size is 2 gigabytes (2'^^). 

• In 64-bit mode, addresses are 64 bits wide. The maximtim user 
process size is 1 terabyte (2^^). 

Figure 4-1 shows the translation of a virtual address into a physical 
address. 



1. Virtual address (VA) represented by the 
virtual page number (VPN) is compared 
with tag in TLB. 



Virtual address 



2. If there is a match, the page frame 
number (PFN) representing the upper 
bits of the physical address (PA) is 
output from the TLB. 



3. The Offset, which does not pass through 
the TLB, is then concatenated to the PFN. 




Physical address 



Figure 4-1 Overview of a Virtual-to-Physical Address Translation 



t Figure 4-8 shows the 32-bit and 64-bit versions of the processor TLB entry. 
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As shown in Figures 4-2 and 4-3, the virtual address is extended with an 
8-bit address space identifier (ASID), which reduces the frequency of TLB 
flushing when switching contexts. This 8-bit ASID is in the CPO EntryHi 
register, described later in this chapter. The Global bit (G) is in the EntryLoO 
and Entn/Lol registers, described later in this chapter. 

Physical Address Space 

Using a 36-bit address, the processor physical address space encompasses 
64 gigabytes. The section following describes the translation of a virtual 
address to a physical address. 

Virtual-to-Physical Address Translation 

Converting a virtual address to a physical address begins by comparing 
I the virtual address from the processor with the virtual addresses in the 

TLB; there is a match when the virtual page number (VPN) of the address 
is the same as the VPN field of the entry, and either: 

• the Global (G) bit of the TLB entry is set, or 

• the ASID field of the virtual address is the same as the ASID 
field of the TLB entry. 

This match is referred to as a TLB hit. If there is no match, a TLB Miss 
exception is taken by the processor and software is allowed to refill the 
TLB from a page table of virtual/ physical addresses in memory. 

If there is a virtual address match in the TLB, the physical address is 
output from the TLB and concatenated with the Ojfset, which represents 
an. address within the page frame space. The Offset does not pass through 
the TLB. 

Virtual-to-physical translation is described in greater detail throughout 
the remainder of this chapter; Figure 4-20 is a flow diagram of the process 

shown at the end of this chapter. 

The next two sections describe the 32-bit and 64-bit address translations. 
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32-bit Mode Address Translation 

Figure 4-2 shows the virtual-to-physical-address translation of a 32-bit 
mode address. 

• The top portion of Figure 4-2 shows a virtual address with a 
12-bit, or 4-Kbyte, page size, labelled Offset. The remaining 20 
bits of the address represent the VPN, and index the IM-entry 

page table. 

• The bottom portion of Figure 4-2 shows a virtual address with 
a 24-bit, or 16-Mbyte, page size, labelled Offset. The remaining 
8 bits of the address represent the VPN, and index the 256- 
entry page table. 
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Figure 4-2 32-bit Mode Virtual Address Translation 
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64-bit Mode Address Translation 

Figure 4-3 shows the virtual-to-physical-address translation of a 64-bit 
mode address. This figure illustrates the two extremes in the range of 
possible page sizes: a 4-Kbyte page (12 bits) and a 16-Mbyte page (24 bits). 

• The top portion of Figure 4-3 shows a virtual address with a 
12-bit, or 4-Kbyte, page size, labelled Offset. The remaining 28 
bits of the address represent the VPN, and index the 256M- 
entry page table. 

• The bottom portion of Figure 4-3 shows a virtual address with 
a 24-bit, or 16-Mbyte, page size, labelled Offset. The remaining 
16 bits of the address represent the VPN, and index the 64K- 
entry page table. 
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Figure 4-3 64-bit Mode Virtual Address Translation 
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Operating Modes 

The processor has three operating modes that function in both 32- and 64- 
bit operations: 

• User mode 

• Supervisor mode 

• Kernel mode 

These modes are described in the next three sections. 

User Mode Operations 

In User mode, a single, vmiform virtual address space — labelled User 
segment — is available; its size is: 

• 2 Gbytes (2^^ bytes) in 32-bit mode {useg) 

• 1 Tbyte (2"^^ bytes) in 64-bit mode (xuseg) 
Figure 4-4 shows User mode virtual address space. 



32-bit* 



64-bit 



Ox FFFF FFFF 



Ox 8000 0000 



Ox 0000 0000 




Ox FFFF EFFF FFFF FEW 



Ox 0000 0100 0000 0000 



useg 



Ox 0000 0000 0000 0000 




xuseg 



Figure 4-4 User Mode Virtual Address Space 

*NOTE: The R4000 uses 64-bit addresses internally. When the kernel 
is rimning in Kernel mode, it initializes registers before switching 
modes, and saves (or restores, whichever is appropriate) register 
values on context switches. In 32-bit mode, a valid address must be a 
32-bit signed nimiber, where bits 63:32 = bit 31. In normal operation 
it is not possible for a 32-bit User-mode program to produce invalid 
addresses. However, although it would be an error, it is possible for a 
Kernel-mode program to erroneously place a value that is not a 32-bit 
signed number into a 64-bit register, in which case the User-mode 
program generates an invalid address. 



MIPS R4000 Microprocessor User's Manual 



67 



Chapter 4 



The User segment starts at address 0 and the current active user process 
resides in either useg (in 32-bit mode) or xuseg (in 64-bit mode). The TLB 
identically maps all references to useg/ xuseg from all modes, and controls 
cache accessibility.^ 

The processor operates in User mode when the Status register contains the 
following bit-values: 

• KSU bits = IO2 

• EXL = 0 

• ERL = 0 

In conjunction with these bits, the UX bit in the Status register selects 
between 32- or 64-bit User mode addressing as follows: 

• when UX = 0, 32-bit useg space is selected and TLB misses are 
handled by the 32-bit TLB refill exception handler 

• when UX = 1, 64-bit xuseg space is selected and TLB misses are 
handled by the 64-bit XTLB refill exception handler 

Table 4-1 lists the characteristics of the two user mode segments, useg and 
xuseg. 



Table 4-1 32-bit and 64-bit User Mode Segments 



Address Bit 
Values 


Status Register 
Bit Values 


Segment 
Name 


Address Range 


Segment Size 


KSU 


EXL 


ERL 


UX 


32-bit 
A(31) = 0 


IO2 


0 


0 


0 


useg 


0x0000 0000 

through 
0x7FFFFFFF 


2 Gbyte 
(2^1 bytes) 


64-bit 
A(63:40) = 0 


IO2 


0 


0 


1 


xuseg 


0x0000 0000 0000 0000 

through 
0x0000 OOFF FFFF FFFF 


1Tbyte 
(2^° bytes) 



68 



t The cached (C) field in a TLB entry determines whether the reference is cached; see Figure 
4-8. 
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32-bit User Mode (useg) 

In User mode, when UX = 0 in the Status register. User mode addressing 
is compatible with the 32-bit addressing model shown in Figure 4-4, and a 
2-Gbyte user address space is available, labelled useg. 

All valid User mode virtual addresses have their most-significant bit 
cleared to 0; any attempt to reference an address with the most-significant 
bit set while in User mode causes an Address Error exception. 

The system maps all references to useg through the TLB, and bit settings 
within the TLB entry for the page determine the cacheability of a reference. 

64-bit User Mode (xuseg) 

In User mode, when UX =1 in the Status register. User mode addressing is 
extended to the 64-bit model shown in Figure 4-4. In 64-bit User mode, the 
processor provides a single, uniform address space of 2^^ bytes, labelled 
xuseg. 

All valid User mode virtual addresses have bits 63:40 equal to 0; an 
attempt to reference an address with bits 63:40 not equal to 0 causes an 
Address Error exception. 

Supervisor Mode Operations 

Supervisor mode is designed for layered operating systems in which a 
true kernel rtms in R4000 Kernel mode, and the rest of the operating 
system runs in Supervisor mode. 

The processor operates in Supervisor mode when the Status register 
contains the following bit-values: 

• KSU = OI2 

• EXL = 0 

• ERL = 0 

In conjunction with these bits, the SX bit in the Status register selects 
between 32- or 64-bit Supervisor mode addressing: 

• when SX = 0, 32-bit supervisor space is selected and TLB 
misses are handled by the 32-bit TLB refill exception handler 

• when SX - 1, 64-bit supervisor space is selected and TLB 
misses are handled by the 64-bit XTLB refill exception handler 
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Figure 4-5 shows Supervisor mode address mapping. Table 4-2 lists the 
characteristics of the supervisor mode segments; descriptions of the 
address spaces follow. 
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Figure 4-5 Supervisor Mode Address Space 

*NOTE: The R4000 uses 64-bit addresses internally. In 32-bit mode, 
a valid address must be a 32-bit signed nvmiber, where bits 63:32 = bit 
31 . In normal operation it is not possible for a 32-bit Supervisor-mode 
program to create an invalid address through arithmetic operations. 
However 32-bit-mode Supervisor programs must not create addresses 
using base register+ojfset calculations that produce a 32-bit 2's- 
complement overflow; in specific, there are two prohibited cases: 

• offset with bit 15 = 0 and base register with bit 31 = 0, but (base 
register+offset) bit 31 = 1 

• offset with bit 15 = 1 and base register with bit 31 = 1, but {)jase 

register+offset) bit 31 = 0 

Using this invalid address produces an imdefined result. 
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Table 4-2 32-bit and 64-bit Supervisor Mode Segments 



Address Bit 
Values 


Status Register 
Bit Values 


Segment 
Name 


Address Range 


Segment 
Size 


KSU 


EXL 


ERL 


sx 


32-bit 
A(31) = 0 


OI2 


0 


0 


0 


SMse^ 


0x0000 0000 
through 

0x7FFF FFFF 


2 Gbytes 
(2^1 bytes) 


32-bit 

A(31:29) = 110, 


OI2 


0 


0 


0 




OxCOOO 0000 

through 
OxDFFF FFFF 


512 Mbytes 


64-bit 
A(63:62) = OOj 


OI2 


0 


0 


1 


xsuseg 


0x0000 0000 0000 0000 

through 
0x0000 OOFF FFFF FFFF 


1Tbyte 
(2^° bytes) 


64-bit 
A(63:62) = Olj 


OI2 


0 


0 


1 


xsseg 


0x4000 0000 0000 0000 

through 
0x4000 OOFF FFFF FFFF 


1Tbyte 
(2^° bytes) 


64-bit 
A(63:62) = II2 


OI2 


0 


0 


1 


csseg 


OxFFFF FFFF COOO 0000 

through 
OxFFFF FFFF DFFF FFFF 


512 Mbytes 
(229 bytes) 



32-bit Supervisor Mode, User Space (suseg) 

In Supervisor mode, when SX = 0 in the Status register and the most- 
significant bit of the 32-bit virtual address is set to 0, the suseg virtual 
address space is selected; it covers the full '2?^ bytes (2 Gbytes) of the 
current user address space. The virtual address is extended with the 
contents of the 8-bit ASID field to form a unique virtual address. 

This mapped space starts at virtual address 0x0000 0000 and runs through 
0x7FFF FFFF. 

32-bit Supervisor Mode, Supervisor Space (sseg) 

In Supervisor mode, when SX = 0 in the Status register and the three most- 
significant bits of the 32-bit virtual address are IIO2, the sseg virtual 
address space is selected; it covers 229-bytes (512 Mbytes) of the current 
supervisor address space. The virtual address is extended with the 
contents of the 8-bit ASID field to form a unique virtual address. 

This mapped space begins at virtual address OxCOOO 0000 and rims 
through OxDFFF FFFF. 
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64-bit Supervisor Mode, User Space (xsuseg) 

In Supervisor mode, when SX = 1 in the Status register and bits 63:62 of the 
virtual address are set to OO2, the xsuseg virtual address space is selected; 
it covers the full 2'^^ bytes (1 Tbyte) of the current user address space. The 
virtual address is extended with the contents of the 8-bit ASID field to 
form a unique virtual address. 

This mapped space starts at virtual address 0x0000 0000 0000 0000 and 
runs through 0x0000 OOFF FFFF FFFF. 

64-bit Supervisor Mode, Current Supervisor Space (xsseg) 

In Supervisor mode, when SX = 1 in the Status register and bits 63:62 of the 
virtual address are set to OI2, the xsseg current supervisor virtual address 
space is selected. The virtual address is extended with the contents of the 
8-bit ASID field to form a unique virtual address. 

This mapped space begins at virtual address 0x4000 0000 0000 0000 and 
runs through 0x4000 OOFF FFFF FFFF. 

64-bit Supervisor Mode, Separate Supervisor Space (csseg) 

In Supervisor mode, when SX = 1 in the Status register and bits 63:62 of the 
virtual address are set to II2, the csseg separate supervisor virtual address 
space is selected. Addressing of the csseg is compatible with addressing 
sseg in 32-bit mode. The virtual address is extended with the contents of 
the 8-bit ASID field to form a unique virtual address. 

This mapped space begins at virtual address OxFFFF FFFF COOO 0000 and 
runs through OxFFFF FFFF DFFF FFFF. 
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Kernel Mode Operations 

The processor operates in Kernel mode when the Status register contains 
one of the following values: 

• KSU = OO2 

• EXL = 1 

• ERL = 1 

In conjunction with these bits, the KX bit in the Status register selects 
between 32- or 64-bit Kernel mode addressing: 

• when KX = 0, 32-bit kernel space is selected and all TLB misses 
are handled by the 32-bit TLB refill exception handler 

• when XX = 1, 64-bit kernel space is selected and all TLB misses 
are handled by the 64-bit XTLB refill exception handler 

The processor enters Kernel mode whenever an exception is detected and 
it remains in Kernel mode until an Exception Return (ERET) instruction is 
executed. The ERET instruction restores the processor to the mode 
existing prior to the exception. 

Kernel mode virtual address space is divided into regions differentiated 
by the high-order bits of the virtual address, as shown in Figure 4-6. Table 
4-3 lists the characteristics of the 32-bit kernel mode segments, and Table 
4-4 lists the characteristics of the 64-bit kernel mode segments. 
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32-bit* 



Ox FEFF EFFF 



Ox EOOO 0000 



Ox COOO 0000 



Ox AOOO 0000 



Ox 8000 0000 



Ox 0000 0000 



0.5 GB 
Mapped 



0.5 GB 
Mapped 



0.5 GB 

Unmapped 
Uncached 



0.5 GB 
Unmapped 
Cached 



2 GB 
Mapped 



kseg3 



ksseg 



Ox FFFF EFFF EFFF FFFF 
Ox FFFF EFFF EOOO 0000 

Ox ETW KFFF COOO 0000 

Ox FFFF EFFF AOOO 0000 



ksegl Ox KEW KFEF 8000 0000 
Ox COOO OOFF 8000 0000 



ksegO 



kuseg 



Ox COOO 0000 0000 0000 



Ox 8000 0000 0000 0000 



Ox 4000 0100 0000 0000 



Ox 4000 0000 0000 0000 



Ox 0000 0100 0000 0000 



Ox 0000 0000 0000 0000 



64-bit 

0.5 GB L 
Mapped 



0.5 GB 
Mapped 



0.5 GB 
Unmapped 
Uncached 



0.5 GB 
Unmapped 
Cached 



Address 
error 



Mapped 



Unmapped 



Address 
error 



1 TB 
Mapped 



Address 
error 



1 TB 

Mapped 



ckseg3 
cksseg 
cksegl 
cksegO 

xkseg 
xkphys 



xksseg 



xkuseg 



Figure 4-6 Kernel Mode Address Space 

*NOTE: The R4000 uses 64-bit addresses internally. In 32-bit mode, 

a valid address must be a 32-bit signed number, where bits 63:32 = bit 
31; an invalid address produces an imdefined result. In 32-bit mode, 
a Kernel-mode program may use 64-bit instructions, but must not 
create addresses using base register+offset calculations that produce a 
32-bit 2's-complement overflow; in specific, there are two prohibited 
cases: 

• offset with bit 15 = 0 and base register with bit 31 = 0, but (base 
register+offset) bit 31 = 1 

• offset with bit 15 = 1 and base register with bit 31 = 1, but (base 
register+offset) bit 31 = 0 
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Table 4-3 32-bit Kernel Mode Segments 



Values 


Status Register 
Is One Of These 
Values 


Name 


Address Range 


Size 


KSU EXL ERL 


KX 


A(31) = 0 


KSU = OO2 

or 

EXL= 1 

or 
ERL=1 


0 


kuseg 


0x0000 0000 

through 
0x7FFF FFFF 


2 Gbytes 
(2^1 bytes) 


A(31:29) = IOO2 


0 


ksegO 


0x8000 0000 

through 
0x9FFF FFFF 


512Mbytes 
(2^9 bytes) 


A(31:29) = IOI2 


0 


ksegl 


OxAOOO 0000 

through 
OxBFFF FFFF 


512Mbytes 
(2^9 bytes) 


A(31:29) = IIO2 


0 


ksseg 


OxCOOO 0000 

through 
OxDFFFFFFF 


512Mbytes 
(2^9 bytes) 


A(31:29) = III2 


0 


kseg3 


OxEOOO 0000 

through 
OxFFFF FFFF 


512 Mbytes 

(229 l^y^gg) 



32-bit Kernel Mode, User Space (kuseg) 

In Kernel mode, when XX = 0 in the Status register, and the most- 
significant bit of the virtual address, A31, is cleared, the 32-bit kuseg virtual 
address space is selected; it covers the full 2^^ bytes (2 Gbytes) of the 

current user address space. The virtual address is extended with the 
contents of the 8-bit ASID field to form a unique virtual address. 

When ERL = 1 in the Status register, the user address region becomes a 
2^^-byte immapped (that is, mapped directly to physical addresses) 
uncached address space. See the Cache Error exception in Chapter 5 for 
more information. 

32-bit Kernel Mode, Kernel Space 0 (ksegO) 

In Kernel mode, when XX = 0 in the Status register and the most- 
significant three bits of the virtual address are IOO2, 32-bit ksegO virtual 
address space is selected; it is the 229-byte (512-Mbyte) kernel physical 
space. References to ksegO are not mapped through the TLB; the physical 
address selected is defined by subtracting 0x8000 0000 from the virtual 
address. The KO field of the Config register, described in this chapter, 
controls cacheability and coherency. 
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32-bit Kernel Mode, Kernel Space 1 (ksegl) 

In Kernel mode, when KX = 0 in the Status register and the most- 
significant three bits of the 32-bit virtual address are 10 12, 32-bit ksegl 
I virtual address space is selected; it is the 2^^-byte (512-Mbyte) kernel 

physical space. 

References to ksegl are not mapped through the TLB; the physical address 
selected is defined by subtracting OxAOOO 0000 from the virtual address. 

Caches are disabled for accesses to these addresses, and physical memory 
(or memory-mapped I/O device registers) are accessed directly. 

32-bit Kernel Mode, Supervisor Space (ksseg) 

In Kernel mode, when KX = 0 in the Status register and the most- 
significant three bits of the 32-bit virtual address are IIO2, the ksseg virtual 
address space is selected; it is the current 2^^-byte (512-Mbyte) supervisor 
virtual space. The virtual address is extended with the contents of the 8- 
bit ASID field to form a unique virtual address. 

32-bit Kernel Mode, Kernel Space 3 (ksegS) 

In Kernel mode, when KX = 0 in the Status register and the most- 
significant three bits of the 32-bit virtual address are III2, the kseg3 virtual 
address space is selected; it is the current 2^^-byte (512-Mbyte) kernel 
virtual space. The virtual address is extended with the contents of the 8-bit 
ASID field to form a unique virtual address. 



76 



MIPS R4000 Microprocessor User's Manual 



Memory Management 



Table 4-4 64-bit Kernel Mode Segments 



Values 


Status Register 
Is One Of These 
Values 


Name 


Address Range 


Size 


KSU EXL ERL 


KX 


A(63:62) = OO2 


KSU = OO2 

or 

EXL = 1 

or 
ERL=1 


1 


xksuseg 


0x0000 0000 0000 0000 

through 
0x0000 OOFF FFFF FFFF 


1 Tbyte 
(2^° bytes) 


A(63:62) = OI2 


1 


xksseg 


0x4000 0000 0000 0000 

through 
0x4000 OOFF FFFF FFFF 


1Tbyte 
(2^° bytes) 


A(63:62) = IO2 


1 


xkphys 


0x8000 0000 0000 0000 
through 
OxBFFF FFFF FFFF FFFF 


8 2^^-byte 
spaces 


A(63:62) = II2 


1 


xkseg 


OxCOOO 0000 0000 0000 

through 
OxCOOO OOFF 7FFF FFFF 


(240_231) 
bytes 


A(63:62) = II2 
A(61:31) = -1 


1 


cksegO 


OxFFFF FFFF 8000 0000 

through 
OxFFFF FFFF 9FFF FFFF 


512Mbytes 
(2^9 bytes) 


A(63:62) = llj 
A(61:31) = -1 


1 


cksegl 


OxFFFF FFFF AOOO 0000 

through 
OxFFFF FFFF BFFF FFFF 


512Mbytes 
(2^9 bytes) 


A(63:62) = II2 
A(61:31) = -1 


1 


cksseg 


OxFFFF FFFF COOO 0000 

through 
OxFFFF FFFF DFFF FFFF 


512Mbytes 
(2^9 bytes) 


A(63:62) = II2 

A(61:31) = -l 


1 


ckseg3 


OxFFFF FFFF EOOO 0000 

through 
OxFFFF FFFF FFFF FFFF 


512 Mbytes 

(229 l^y^g^) 



64-bit Kernel Mode, User Space (xkuseg) 

In Kernel mode, when KX -lin the Status register and bits 63:62 of the 64- 
bit virtual address are OO2, the xkuseg virtual address space is selected; it 
covers the current user address space. The virtual address is extended 
with the contents of the 8-bit ASID field to form a unique virtual address. 

When ERL = 1 in the Status register, the user address region becomes a 2^^- 
byte immapped (that is, mapped directly to physical addresses) uncached 
address space. See the Cache Error exception in Chapter 5 for more 
information. 
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64-bit Kernel Mode, Current Supervisor Space (xksseg) 

In Kernel mode, when KX = 1 in the Status register and bits 63:62 of the 64- 
bit virtual address are OI2, the xksseg virtual address space is selected; it is 
the current supervisor virtual space. The virtual address is extended with 
the contents of the 8-bit ASID field to form a unique virtual address. 

64-bit Kernel Mode, Physical Spaces ixkphys) 

In Kernel mode, when KX -lin the Status register and bits 63:62 of the 64- 
bit virtual address are IO2, the xkphys virtual address space is selected; it is 
a set of eight 2'^^-byte kernel physical spaces. Accesses with address bits 
58:36 not equal to 0 cause an address error. 

References to this space are not mapped; the physical address selected is 
taken from bits 35:0 of the virtual address. Bits 61:59 of the virtual address 
I specify the cacheability and coherency attributes, as shown in Table 4-5. 



Table 4-5 Cacheability and Coherency Attributes 



Value (61:59) 


Cacheability and Coherency Attributes 


Starting Address 


0 


Reserved 


0x8000 0000 0000 0000 


1 


Reserved 


0x8800 0000 0000 0000 


2 


Uncached 


0x9000 0000 0000 0000 


3 


Cacheable, noncoherent 


0x9800 0000 0000 0000 


4 


Cacheable, coherent exclusive 


OxAOOO 0000 0000 0000 


5 


Cacheable, coherent exclusive on write 


OxASOO GOOD 0000 0000 


6 


Cacheable, coherent update on write 


OxBOOO 0000 0000 0000 


7 


Reserved 


0xB800 0000 0000 0000 
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64-bit Kernel Mode, Kernel Space (xkseg) 

In Kernel mode, when KX = 1 in the Status register and bits 63:62 of the 64- 
bit virtual address are II2, the address space selected is one of the 
following: 

I • kernel virtual space, xkseg, the current kernel virtual space; the 

virtual address is extended with the contents of the 8-bit ASID 
field to form a unique virtual address 

• one of the four 32-bit kernel compatibility spaces, as described 
in the next section. 

64-bit Kernel Mode, Compatibility Spaces (cksegl:0, cksseg, cksegS) 

In Kernel mode, when XX = 1 in the Status register, bits 63:62 of the 64-bit 
virtual address are II2, and bits 61:31 of the virtual address equal -1, the 
lower two bytes of address, as shown in Figure 4-6, select one of the 
following 512-Mbyte compatibility spaces. 

• cksegO. This 64-bit virtual address space is an unmapped 
region, compatible with the 32-bit address model ksegO. The KO 
field of the Config register, described in this chapter, controls 
cacheability and coherency. 

• cksegl. This 64-bit virtual address space is an unmapped and 
uncached region, compatible with the 32-bit address model 
ksegl. 

• cksseg. This 64-bit virtual address space is the current 
supervisor virtual space, compatible with the 32-bit address 
model foseg. 

• ckseg3. This 64-bit virtual address space is kernel virtual space, 
compatible with the 32-bit address model kseg3. 
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4.3 System Control Coprocessor 

The System Control Coprocessor (CPO) is implemented as an integral part 
of the CPU, and supports memory management, address translation, 
exception handling, and other privileged operations. CPO contains the 
registers shown in Figiire 4-7 plus a 48-entry TLB. The sections that foUow 
describe how the processor uses the memory management-related 
registers^. 

Each CPO register has a unique number that identifies it; this number is 
referred to as the register number. For instance, the Page Musk register is 
register number 5. 





EntryLoO b 


EntryHi 
10* 


EntryLol 1 





47 



TLB 



("Safe" entries) 
(See Random Register, 
contents of TLB Wired) 
127 0 



Index 
0* 



Page Mask 
5* 



Context 
4* 



BadVAddr 
8* 





Used witfi memory 
management system. 

'Register number 

Figure 4-7 CPO Registers and the TLB 



Used with exception 
processing. See 
Cfiapter 5 for details. 



t For a description of CPO data dependencies and hazards, please see Appendix F. 
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Format of a TLB Entry 



Figure 4-8 shows the TLB entry formats for both 32- and 64-bit modes. 
Each field of an entry has a corresponding field in the EntryHi, EntryLoO, 
EntryLol, or PageMask registers, as shown in Figiires 4-9 and 4-10; for 
example the Mask field of the TLB entry is also held in the PageMask 
register. 



f 127 



128-bit TLB 
entry in 32- 
bit mode of " 
R4000 
processor 



256-bit TLB 
entry in 64- 
bit mode of 
R4000 
processor 



121 120 



32-bit Mode 
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0 


MASK 


1 


95 


7 


12 


77 


76 


13 
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VPN2 


G 


0 


ASID 1 
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61 
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1 


4 


8 
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32 


0 


PFN 


C 


D 


V 


"I 


2 
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29 
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6 


3 
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1 

1 


1 
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3 


i 


i 


i 


255 




64-bjt Mode 
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192 



39 
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Figure 4r8 Format of a TLB Entry 
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The format of the EntryHi, EntryLoO, EntryLol, and PageMask registers are 
nearly the same as the TLB entry. The one exception is the Global field 
(G bit), which is used in the TLB, but is reserved in the EntryHi register. 
Figures 4-9 and 4-10 describe the TLB entry fields shown in Figure 4-8. 



31 



25 24 



PageMask Register 



13 12 



0 



MASK 



0 



32-bit 
Mode 

7 12 13 

Mask Page comparison mask. 

0 Reserved. Must be written as zeroes, and returns zeroes wlien read. 



EntryHi Register 

' " 13 12 



32-bit 
iUlode 



64-bit 
iUlode 



VPN2 


0 


ASID 1 


19 5 8 
63 62 61 40 39 13 12 8 7 0 


R 


FILL 


VPN2 


0 


ASID 1 



2 22 27 5 8 



VPN2 ... Virtual page number divided by two (maps to two pages). 
ASID .... Address space ID field. An 8-bit field that lets multiple processes share the TLB; 
each process has a distinct mapping of othenwise identical virtual page numbers. 

R Region. (00 user, 01 -> supervisor, 11 kernel) used to match vAddrea .62 

Fill Reserved. 0 on read; ignored on write. 

0 Reserved. Must be written as zeroes, and returns zeroes when read. 



Figure 4-9 Fields of the PageMask and EntryHi Registers 
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PFN Page frame number; the upper bits of the physical address. 

C Specifies the TLB page coherency attribute; see Table 4-6. 

D Dirty. If this bit is set, the page is marked as dirty and, therefore, writable. This bit is 

actually a write-protect bit that software can use to prevent alteration of data. 

V Valid. If this bit is set, it indicates that the TLB entry is valid; otherwise, a TLBL or TLBS 

miss occurs. 

G Global. If this bit is set in both LoO and Lo1, then the processor ignores the ASID during 

TLB lookup. 

0 Reserved. Must be written as zeroes, and returns zeroes when read. 

Figure 4-10 Fields of the EntryLoO and EntryLol Registers 
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The TLB page coherency attribute (C) bits specify whether references to 
the page should be cached; if cached, the algorithm selects between several 
coherency attributes. Table 4-6 shows the coherency attributes selected by 
the C bits. 



Table 4-6 TLB Page Coherency (C) Bit Values 



C(5:3) Value 


Page Coherency Attribute 


0 


Reserved 


1 


Reserved 


2 


Uncached 


3 


Cacheable noncoherent (noncoherent) 


4 


Cacheable coherent exclusive (exclusive) 


5 


Cacheable coherent exclusive on write (sharable) 


6 


Cacheable coherent update on write (update) 


7 


Reserved 



CPO Registers 

The following sections describe the CPO registers, shown in Figure 4-7, 
that are assigned specifically as a software interface with memory 
management (each register is followed by its register nvimber in 
parentheses). 

• Index register (CPO register number 0) 

• Random register (1) 

• EntryLoO (2) and EntryLol (3) registers 

• PageMask register (5) 

• Wired register (6) 

• EntryHi register (10) 

• PRId register (15) 

• Config register (16) 

• LLAddr register (17) 

• TagLo (28) and TagHi (29) registers 



84 



MIPS R4000 Microprocessor User's Manual 



Memory Management 



Index Register (0) 

The Index register is a 32-bit, read/ write register containing six bits to 
index an entry in the TLB. The high-order bit of the register shows the 
success or failure of a TLB Probe (TLBP) instruction. 

The Index register also specifies the TLB entry affected by TLB Read 
(TLBR) or TLB Write Index (TLBWI) instructions. 

Figure 4-11 shows the format of the Index register; Table 4-7 describes the 
Index register fields. 



Index Register 



31 


30 




6 


5 0 


P 


0 


Index 1 


1 




25 




6 



Figure 4-11 Index Register 



Table 4-7 Index Register Field Descriptions 



Field 


Description 


P 


Probe failure. Set to 1 when the previous TLBProbe 
(TLBP) instruction was unsuccessful. 


Index 


Index to the TLB entry affected by the TLBRead and 

TLBWrite instructions 


0 


Reserved. Must be written as zeroes, and retiims zeroes 
when read. 
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Random Register (1) 

The Random register is a read-only register of which six bits index an entry 
in the TLB. This register decrements as each instruction executes, and its 
values range between an upper and a lower bound, as follows: 

• A lower boiind is set by the number of TLB entries reserved for 
exclusive use by the operating system (the contents of the 

Wired register). 

• An upper bound is set by the total number of TLB entries (47 
maximum). 

The Random register specifies the entry in the TLB that is affected by the 
TLB Write Random instruction. The register does not need to be read for 
this purpose; however, the register is readable to verify proper operation 

of the processor. 

To simplify testing, the Random register is set to the value of the upper 
boimd upon system reset. This register is also set to the upper bound 
when the Wired register is written. 

Figure 4-12 shows the format of the Random register; Table 4-8 describes 
the Random register fields. 

Random Register 

31 65 0 

^ Random 

26 6 

Figure 4-12 Random Register 



Table 4-8 Random Register Field Descriptions 



Field 


Description 


Random 


TLB Random index 


0 


Reserved. Must be written as zeroes, and returns zeroes 
when read. 
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EntryLoO (2), and EntryLol (3) Registers 

The EntryLo register consists of two registers that have identical formats: 

• EntryLoO is used for even virtual pages. 

• EntryLol is used for odd virtual pages. 

The EntryLoO and EntryLol registers are read / write registers. They hold 
the physical page frame nimiber (PFN) of the TLB entry for even and odd 
pages, respectively, when performing TLB read and write operations. 
Figure 4-10 shows the format of these registers. 

PageMask Register (5) 

The PageMask register is a read/ write register used for reading from or 
writing to the TLB; it holds a comparison mask that sets the variable page 
size for each TLB entry, as shown in Table 4-9. 

TLB read and write operations use this register as either a source or a 
destination; when virtual addresses are presented for translation into 
physical address, the corresponding bits in the TLB identify which virtual 
address bits among bits 24:13 are used in the comparison. When the Mask 
field is not one of the values shown in Table 4-9, the operation of the TLB 
is imdefined. 



Table 4-9 Mask Field Values for Page Sizes 



Page Size 


Bit 


24 


23 


22 


21 


20 


19 


18 


17 


16 


15 


14 


13 


4 Kbytes 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


16 Kbytes 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


1 


1 


64 Kbytes 


0 


0 


0 


0 


0 


0 


0 


0 


1 


1 


1 


1 


256 Kbytes 


0 


0 


0 


0 


0 


0 


1 


1 


1 


1 


1 


1 


1Mbyte 


0 


0 


0 


0 


1 


1 


1 


1 


1 


1 


1 


1 


4 Mbytes 


0 


0 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


16 Mbytes 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 
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Wired Register (6) 

The Wired register is a read/ write register that specifies the boundary 
between the wired and random entries of the TLB as shown in Figure 4-13. 
Wired entries are fixed, nonreplaceable entries, which cannot be 
overwritten by a TLB write operation. Random entries can be overwritten. 



TLB 



47 



Range of Random entries 



Range of Wired^entries 



.Wired 
Register 



Figure 4-13 Wired Register Boundary 

The Wired register is set to 0 upon system reset. Writing this register also 
sets the Random register to the value of its upper boimd (see Random 
register, above). Figure 4-14 shows the format of the Wired register; Table 
4-10 describes the register fields. 



31 



Wired Register 



6 5 



Wired 



26 



Figure 4-14 Wired Register 



Table 4-10 Wired Register Field Descriptions 



Field 


Description 


Wired 


TLB Wired boundary 


0 


Reserved. Must be written as zeroes, and returns 
zeroes when read. 
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EntryHi Register (CPO Register 10) 

The EntryHi register holds the high-order bits of a TLB entry for TLB read 
and write operations. 

The EntryHi register is accessed by the TLB Probe, TLB Write Random, 
TLB Write Indexed, and TLB Read Indexed instructions. 

Figure 4-9 shows the format of this register. 

When either a TLB refill, TLB invalid, or TLB modified exception occurs, 
the EntryHi register is loaded with the virtual page ntraiber (VPN2) and 
the ASID of the virtual address that did not have a matching TLB entry. 
(See Chapter 5 for more information about these exceptions.) 

Processor Revision Identifier (PRId) Register (15) 

The 32-bit, read-only Processor Revision Identifier (PRId) register contains 
information identifying the implementation and revision level of the CPU 
and CPO. Figure 4-15 shows the format of the PRId register; Table 4-11 
describes the PRId register fields. 



31 



PRId Register 

16 15 



Imp 



Rev 



16 



Figure 4-15 Processor Revision Identifier Register Format 



Table 4-1 1 PRId Register Fields 



Field 


Description 


Imp 


Implementation number 


Rev 


Revision number 


0 


Reserved. Must be written as zeroes, and returns zeroes 
when read. 
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The low-order byte (bits 7:0) of the PRId register is interpreted as a revision 
number, and the high-order byte (bits 15:8) is interpreted as an 
implementation niraiber. The implementation number of the R4000 
processor is 0x04. The content of the high-order halfword (bits 31:16) of 
the register are reserved. 

The revision number is stored as a value in the form y.x, where y is a major 
revision number in bits 7:4 and x is a minor revision number in bits 3:0. 

The revision number can distinguish some chip revisions, however there 
is no guarantee that changes to the chip will necessarily be reflected in the 
PRId register, or that changes to the revision number necessarily reflect 
real chip changes. For this reason, these values are not listed and software 
should not rely on the revision num.ber in the PRId register to characterize 
the chip. 

Config Register (16) 

The Config register specifies various configuration options selected on 
R4000 processors; Table 4-12 lists these options. 

Some configuration options, as defined by Config bits 31:6, are set by the 
hardware during reset and are included in the Config register as read-only 
status bits for the software to access. Other configuration options are 
read/write (as indicated by Config register bits 5:0) and controlled by 
software; on reset these fields are undefined. 

Certain configurations have restrictions. The Config register should be 
initialized by software before caches are used. Caches should be written 
back to memory before line sizes are changed, and caches should be 
reinitialized after any change is made. 

Figure 4-16 shows the format of the Config register; Table 4-12 describes 
the Config register fields. 



Config Register 



31 


30 28 


27 24 


23 22 


21 20 


19 18 


17 
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15 


14 


13 


12 


11 9 


8 6 


5 


4 


3 


2 0 


CM 


EC 


EP 


SB 


ss 


sw 


EW 


SC 


SM 


BE 


EM 


EB 


0 


IC 


DC 


IB 


DB 


CU 
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3 


4 
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1 


1 
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1 


1 


1 


1 


1 


1 


3 


3 


1 


1 


1 
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Figure 4rl6 Config Register Format 
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Table 4-12 Config Register Fields 


Field 


Description 


CM 


Master-Checker Mode (1 Master/Checker Mode is enabled). 


EC 


System clock ratio: 

0 -> processor clock frequency divided by 2 

1 processor clock frequency divided by 3 

2 processor clock frequency divided by 4 

3 — > processor clock frequency divided by 6 (R4400 processor only) 

4 -> processor clock frequency divided by 8 (R4400 processor only) 


EP 


Transmit data pattern (pattern for write-back data): 

0 ^ D Doubleword every cycle 

1 -> DDx 2 Doublewords every 3 cycles 

2 — ^ DDxx 2 Doublewords every 4 cycles 

3 —> DxDx 2 Doublewords every 4 cycles 

4 — > DDxxx 2 Doublewords every 5 cycles 

5 —> DDxxxx 2 Doublewords every 6 cycles 

6 —> DxxDxx 2 Doublewords every 6 cycles 

7 DDxxxxxx 2 Doublewords every 8 cycles 

8 — > DxxxDxxx 2 Doublewords every 8 cycles 


SB 


Secondary Cache line size: 

0 —> i words 

1 —>8 words 

O ,1/' .„-J^ 

2 — > 16 words 

3 ^ 32 words 


cc 


Split Secondary Cache Mode 

0 — > instruction and data mixed in secondary cache (joint cache) 

1 — > instruction and data separated by SCAddr(17) 


sw 


Secondary Cache port width 

0 128-bit data path to S-cache 

1 —> Reserved 


EW 


System Port width 
0 64-bit 

1, 2, 3 —> Reserved 


SC 


Secondary Cache present 

0 S-cache present 

1 -> no S-cache present 
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Table 4-12 (cont.) Config Register Fields 



Field Name 


Description 


CA A 

bM 


Dirty Shared coherency state 

0 — > Dirty Shared coherency state is enabled 

1 Dirty Shared state is disabled 


BE 


BigEndianMem 

0 kernel and memory are little endian 

1 -> kernel and memory are big endian 


EM 


ECC mode enable 

0 ECC mode enabled 

1 —> parity mode enabled 


EB 


Block ordering 

0 sequential 

1 sub-block 


0 


Reserved. Must be written as zeroes, returns zeroes when read. 


IC 


Primary I-cache Size (I-cache size = 2^^"""^^ bytes). In the R4000 processor, 
this is set to 8 Kbytes; in the R4400 processor, this is set to 16 Kbytes. 


DC 


i-^runary D-cache Size (D-cache size = 2 bytes). In the K4(J(J(J processor, 
this is set to 8 Kbytes, in the R4400 processor, this is set to 16 Kbytes. 


IB 


1 rimary i-cacne ime size 

0 — > 16 bytes 

1 ^ 32 bytes 


DB 


Primary D-cache line size 

0 ^ 16 bytes 

1 ^ 32 bytes 


CU 


Update on Store Conditional 

0 Store Conditional uses coherency algorithm specified by TLB 

1 SC uses cacheable coherent update on write 


KO 


ksegO coherency algorithm (see EntryLoO and EntryLol registers and the C 
field of Table 4-6) 



92 



MIPS R4000 Microprocessor User's Manual 



Memory Management 



Load Linked Address (LLAddr) Register (17) 

The read /write Load Linked Address (LLAddr) register contains the physical 
address read by the most recent Load Linked instruction. 

This register is for diagnostic purposes only, and serves no function 
during normal operation. 

Figure 4-17 shows the format of the LLAddr register; PAddr represents bits 
of the physical address, PA(35:4). 



LLAddr Register 

31 



PAddr(35:4) 



32 

Figure 4-17 LLAddr Register Format 



Cache Tag Registers [TagLo (28) and TagHi (29)] 

The TagLo and TagHi registers are 32-bit read/ write registers that hold 
either the primary cache tag and parity, or the secondary cache tag and 
ECC during cache initialization, cache diagnostics, or cache error 
processing. The Tag registers are written by the CACHE and MTCO 
instructions. 

The P and ECC fields of these registers are ignored on Index Store Tag 
operations. Parity and ECC are computed by the store operation. 

Figure 4-18 shows the format of these registers for primary cache 
operations. Figure 4-19 shows the format of these registers for secondary 
cache operations. 

Table 4-13 lists the field definitions of the TagLo and TagHi registers. 
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31 8 7 6 5 1 0 
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PTagLo 


PState 
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24 
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0 
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Undefined | 
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Figure 4-18 TagLo and TagHi Register (P-cache) Formats 
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0 


TagHi 


Undefined | 



32 

Figure 4-19 TagLo and TagHi Register (S-cache) Formats 
Table 4-13 Cache Tag Register Fields 



Field 


Description 


PTagLo 


Specifies the physical address bits 35:12 


PState 


Specifies the primary cache state 


P 


Specifies the primary tag even parity bit 


STagLo 


Specifies the physical address bits 35:17 


SState 


Specifies the secondary cache state 


VIndex 


Specifies the virtual index of the associated Primary cache line, 
vAddr(14:12) 


ECC 


ECC for the STag, SState, and VIndex fields 


0 


Reserved. Must be written as zeroes, and returns zeroes when read. 


Undefined 


The TagHi register should not be used. 
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Virtual-to-Physical Address Translation Process 

During virtual-to-physical address translation, the CPU compares the 
8-bit ASID (if the Global bit, G, is not set) of the virtual address to the ASID 
of the TLB entry to see if there is a match. One of the following 
comparisons are also made: 

• In 32-bit mode, the highest 7-to-19 bits (depending upon the 
page size) of the virtual address are compared to the contents 
of the TLB virtual page number. 

• In 64-bit mode, the highest 15-to-27 bits (depending upon the 
page size) of the virtual address are compared to the contents 
of the TLB virtual page number. 

If a TLB entry matches, the physical address and access control bits (C, D, 
and V) are retrieved from the matching TLB entry. While the V bit of the 
entry must be set for a valid translation to take place, it is not involved in 
the determination of a matching TLB entry. 

Figure 4-20 illustrates the TLB address translation process. 
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Virtual Address (Input) 

For valid / VPN \ 

address space, see ( and \ 

the section describing \ ASID / 
Operating Modes I 
in this chapter. A 




Physical Address (Output) 



Figure 4-20 TLB Address Translation 
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TLB Misses 

If there is no TLB entry that matches the virtual address, a TLB miss 
exception occurs.^ If the access control bits (D and V) indicate that the 
access is not valid, a TLB modification or TLB invalid exception occiirs. If 
the C bits equal OIO2, the physical address that is retrieved accesses main 
memory, bjrpassing the cache. 

TLB Instructions 

Table 4-14 lists the instructions that the CPU provides for working with 
the TLB. See Appendix A for a detailed description of these instructions. 

Table 4-14 TLB Instructions 



Op Code 


Description of Instruction 


TLBP 


Translation Lookaside Buffer Probe 


TLBR 


Translation Lookaside Buffer Read 


TLBWI 


Translation Lookaside Buffer Write Index 


TLBWR 


Translation Lookaside Buffer Write Random 



t TLB miss exceptions are described in Chapter 5. 
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5 



This chapter describes the CPU exception processing, including an 
explanation of exception processing, followed by the format and use of 
each CPU exception register. 

The chapter concludes with a description of each exception's cause, 
together with the manner in which the CPU processes and services these 
exceptions. For information about Floating-Point Unit exceptions, see 
Chapter 7. 
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5.1 How Exception Processing Works 

The processor receives exceptions from a number of sources, including 
translation lookaside buffer (TLB) misses, arithmetic overflows, 1/ O 
interrupts, and system calls. When the CPU detects one of these 
exceptions, the normal sequence of instruction execution is suspended 
and the processor enters Kernel mode (see Chapter 4 for a description of 
system operating modes). 

The processor then disables interrupts and forces execution of a software 
exception processor (called a handler) located at a fixed address. The 
handler saves the context of the processor, including the contents of the 
program counter, the current operating mode (User or Supervisor), and 
the status of the interrupts (enabled or disabled). This context is saved so 
it can be restored when the exception has been serviced. 

When an exception occurs, the CPU loads the Exception Program Counter 
(EPC) register with a location where execution can restart after the 
exception has been serviced. The restart location in the EPC register is the 
address of the instruction that caused the exception or, if the instruction 
was executing in a branch delay slot, the address of the branch instruction 
immediately preceding the delay slot. 

The registers described later in the chapter assist in this exception 
processing by retaining address, cause and status information. 

For a description of the exception handling process, see the description of 
the individual exception contained in this chapter, or the flowcharts at the 
end of this chapter. 
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5.2 Exception Processing Registers 

This section describes the CPO registers that are used in exception 
processing. Table 5-1 lists these registers, along with their number — each 
register has a unique identification number that is referred to as its register 
number. For instance, the ECC register is register number 26. The 
remaining CPO registers are used in memory management, as described in 
Chapter 4. 

Software examines the CPO registers during exception processing to 
determine the cause of the exception and the state of the CPU at the time 
the exception occurred. The registers in Table 5-1 are used in exception 
processing, and are described in the sections that follow. 



Table 5-1 CPO Exception Processing Registers 



Register Name 


Reg. No. 


Context 


4 


BadVAddr (Bad Virtual Address) 


8 


Count 


9 


Compare register 


11 


Status 


12 


Cause 


13 


EPC (Exception Program Counter) 


14 


WatchLo (Memory Reference Trap Address Low) 


18 


WatchHi (Memory Reference Trap Address High) 


19 


XContext 


20 


ECC 


26 


CacheErr (Cache Error and Status) 


27 


ErrorEPC (Error Exception Program Counter) 


30 



CPU general registers are interlocked and the result of an instruction can 
normally be used by the next instruction; if the result is not available right 
away, the processor stalls until it is available. CPO registers and the TLB 
are not interlocked, however; there may be some delay before a value 
written by one instruction is available to following instructions. For more 
information please see Appendix F. 
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Context Register (4) 

The Context register is a read /write register containing the pointer to an 
entry in the page table entry (PTE) array; this array is an operating system 
data structiire that stores virtual-to-physical address tramslations. When 
there is a TLB miss, the CPU loads the TLB with the missing translation 
from the PTE array. Normally, the operating system uses the Context 
register to address the current page map which resides in the kernel- 
mapped segment, kseg3. The Context register duplicates some of the 
information provided in the BadVAddr register, but the information is 
arranged in a form that is more useful for a software TLB exception 
handler. Figure 5-1 shows the format of the Context register; Table 5-2 
describes the Context register fields. 

Context Register 



32-bit 
iUode 



31 23 


22 




4 


3 0 


PTEBase 


BadVPN2 


° 1 



9 19 4 



63 




23 


22 


4 


3 0 


64-bit 
iUlode 


PTEBase 


BadVPN2 


° 1 






41 




19 


4 



Figure 5-1 Context Register Format 



Table 5-2 Context Register Fields 



Field 


Description 


BadVPN2 


This field is written by hardware on a miss. It contains 

the virtual page number (VPN) of the most recent 
virtual address that did not have a valid translation. 


PTEBase 


This field is a read/ write field for use by the operating 
system. It is normally written with a value that allows 
the operating system to use the Context register as a 
pointer into the current PTE array in memory. 



The 19-bit BadVPNI field contains bits 31:13 of the virtual address that 
caused the TLB miss; bit 12 is excluded because a single TLB entry maps 
to an even-odd page pair. For a 4-Kbyte page size, this format can directly 
address the pair-table of 8-byte PTEs. For other page and PTE sizes, 
shifting and masking this value produces the appropriate address. 
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Bad Virtual Address Register (BadVAddr) (8) 

The Bad Virtual Address register (BadVAddr) is a read-only register that 
displays the most recent virtual address that caused one of the following 
exceptions: TLB Invalid, TLB Modified, TLB Refill, Virtual Coherency 
Data Access, or Virtual Coherency Instruction Fetch. 

Figure 5-2 shows the format of the BadVAddr register. 



31 



BadVAddr Register 



32-bit 
Mode 



64-bit 
Mode 



Bad Virtual Address 



63 



32 



Bad Virtual Address 



64 



Figure 5-2 BadVAddr Register Format 

Note: The BadVAddr register does not save any information for bus errors, 
since bus errors are not addressing errors. 



Count Register (9) 



The Count register acts as a timer, incrementing at a constant rate — ^half the 
maximum instruction issue rate — ^whether or not an instruction is 
executed, retired, or any forward progress is made through the pipeline. 

This register can be read or written. It can be written for diagnostic 
purposes or system initialization; for example, to s5mchronize processors. 

Figure 5-3 shows the format of the Count register. 

Count Register 

31 0 



Count 



32 



Figure 5-3 Count Register Format 



MIPS R4000 Microprocessor User's Manual 



103 



Chapter 5 



Compare Register (11) 

The Compare register acts as a timer (see also the Count register); it 
maintains a stable value that does not change on its own. 

When the value of the Count register equals the value of the Compare 
register, interrupt bit IP(7) in the Cause register is set. This causes an 
interrupt as soon as the interrupt is enabled. 

Writing a value to the Compare register, as a side effect, clears the timer 
interrupt. 

For diagnostic purposes, the Compare register is a read/write register. In 
normal use however, the Compare register is write-only. Figure 5-4 shows 
the format of the Compare register. 



Compare Register 

31 



Compare 



32 

Figure 5-4 Compare Register Format 
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Status Register (12) 

The Status register (SR) is a read/ write register that contains the operating 
mode, interrupt enabling, and the diagnostic states of the processor. The 
following list describes the more important Status register fields; Figures 
5-5 and 5-6 show the format of the entire register, including descriptions 
of the fields. Some of the important fields include: 

• The 8-bit Interrupt Mask (IM) field controls the enabling of eight 
interrupt conditions. Interrupts must be enabled before they 
can be asserted, and the corresponding bits are set in both the 
Interrupt Mask field of the Status register and the Interrupt 
Pending field of the Cause register. For more information, refer 
to the Interrupt Pending (IP) field of the Cause register and 
Chapter 15, which describes the interrupts. 

• The 4-bit Coprocessor Usability {CU) field controls the usability 
of 4 possible coprocessors. Regardless of the CUO bit setting, 
CPO is always usable in Kernel mode. 

• The 9-bit Diagnostic Status (DS) field is used for self-testing, 
and checks the cache and virtual memory system. 

• The Reverse-Endian (RE) bit, bit 25, reverses the endianness of 
the machine. The processor can be configured as either little- 
endian or big-endian at system reset; reverse-endian selection 
is used in Kernel and Supervisor modes, and in the User mode 
when the RE bit is 0. Setting the RE bit to 1 inverts the User 
mode endianness. 

Status Register Format 

Figure 5-5 shows the format of the Status register. Table 5-3 describes the 
Status register fields. Figure 5-6 and Table 5-4 provide additional 
information on the Diagnostic Status (DS) field. All bits in the DS field 
except TS are readable and writable. 
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Figure 5-5 Status Register 
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Table 5-3 Status Register Fields 



Field 


Description 


CU 


Controls the usability of each of the four coprocessor unit 
numbers. CPO is always usable when in Kernel mode, 
regardless of the setting of the CUq bit. 

1 —> usable 

0 —> unusable 


RP 


Enables reduced-power operation by reducing the internal 
clock frequency. The clock divisor is programmable at boot 
time. 

0 -> full speed 

l-> reduced clock 


f R 


Enables additional floating-point registers 

0 ^ 16 registers 

1 -> 32 registers 


RE 


Rcvcrsc-EiidiiDi bit, valid in User mode. 


DS 


Diagnostic Status field (see Figure 5-6). 


IM 


Interrupt Mask: controls the enabling of each of the external, 
internal, and software interrupts. An interrupt is taken if 
interrupts are enabled, and the corresponding bits are set in 
both the Interrupt Mask field of the Status register and the 
Interrupt Pending field of the Cause register. 

0 disabled 

1^ enabled 


KX 


Enables 64-bit addressing in Kernel mode. The extended- 
addressing TLB refill exception is used for TLB misses on 
kernel addresses. 

0 32-bit 

1 ^ 64-bit 


SX 


Enables 64-bit addressing and operations in Supervisor 
mode. The extended-addressing TLB refill exception is used 
for TLB misses on supervisor addresses. 

0 ^ 32-bit 

1 ^ 64-bit 
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Table 5-3 (cont.) Status Register Fields 



Field 


Description 


UX 


Enables 64-bit addressing and operations in User mode. 
The extended-addressing TLB refill exception is used for 
TLB misses on user addresses. 

0 32-bit 

1 ^ 64-bit 




Mode bits 

IO2 ^User 

OI2 Supervisor 

OO2 ^Kernel 


ERL 


Error Level; set by the processor when Reset, Soft Reset, 
NMI, or Cache Error exception are taken. 

0 ^normal 

1 -> error 


EXL 


Exception Level; set by the processor when any exception 
other than Reset, Soft Reset, NMI, or Cache Error exception 
are taken. 

0 ^normal 

1 —> exception 


IE 


Interrupt Enable 

0 -> disable interrupts 

1 -> enables interrupts 
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Diagnostic Status Field 
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Figure 5-6 Status Register DS Field 



Table 5-4 Status Register Diagnostic Status Bits 



Bit 


Description 


BEV 


Controls the location of TLB refill and general exception 
vectors. 

0 -> normal 

l-> bootstrap 


TS 


1^ Indicates TLB shutdown has occurred (read-only). 


SR 


1^ Indicates a Reset* signal or NMI has caused a Soft Reset 
exception 


CH 


Hit (tag match and valid state) or miss indication for last 
CACHE Hit Invalidate, Hit Write Back Invalidate, Hit Write 
Back, Hit Set Virtual, or Create Dirty Exclusive for a 
secondary cache. 

0 —> miss 

1 ^hit 


CE 


Contents of the ECC register set or modify the check bits of the 
caches when CE = 1; see description of the ECC register. 


DE 


Specifies that cache parity or ECC errors cannot cause 
exceptions. 

0 parity/ECC remain enabled 

1 -> disables parity/ECC 


0 


Reserved. Must be written as zeroes, and returns zeroes 
when read. 
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Status Register Modes and Access States 

Fields of the Status register set the modes and access states described in the 
sections that follow. 

Interrupt Enable: Interrupts are enabled when all of the following 
conditions are true: 

• IE = 1 

• EXL = 0 

• ERL = 0 

I If these conditions are met, the settings of the IM bits enable the interrupt. 

Operating Modes: The following CPU Status register bit settings are 
required for User, Kernel, and Supervisor modes (see Chapter 4 for more 
information about operating modes). 

• The processor is in User mode when KSU = IO2, EXL = 0, and 
ERL = 0. 

• The processor is in Supervisor mode when KSU = OI2, EXL = 0, 
and ERL = 0. 

• The processor is in Kernel mode when KSU = OO2, or EXL = 1, 

or ERL = 1. 

32- and 64-bit Modes: The following CPU Status register bit settings select 
32- or 64-bit operation for User, Kernel, and Supervisor operating modes. 
Enabling 64-bit operation permits the execution of 64-bit opcodes and 
translation of 64-bit addresses. 64-bit operation for User, Kernel and 
Supervisor modes can be set independently. 

• 64-bit addressing for Kernel mode is enabled when KX = 1. 
64-bit operations are always valid in Kernel mode. 

• 64-bit addressing and operations are enabled for Supervisor 

mode when SX = 1. 

• 64-bit addressing and operations are enabled for User mode 
when UX = 1. 

Kernel Address Space Accesses: Access to the kernel address space is 
allowed when the processor is in Kernel mode. 

Supervisor Address Space Accesses: Access to the supervisor address 
space is allowed when the processor is in Kernel or Supervisor mode, as 
I described above in the section above titled. Operating Modes. 
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User Address Space Accesses: Access to the user address space is allowed 
in any of the three operating modes. 

Status Register Reset 

The contents of the Status register are undefined at reset, except for the 
following bits in the Diagnostic Status field: 

• TS = 0 

• ERL and BEV = 1 

The SR bit distinguishes between the Reset exception and the Soft Reset 
exception (caused either by Reset* or Nonmaskable Interrupt [NMI]). 

Cause Register (13) 

The 32-bit read/ write Cause register describes the cause of the most recent 
exception. 

Figure 5-7 shows the fields of this register; Table 5-5 describes the Cause 
register fields. A 5-bit exception code (ExcCode) indicates one of the 
causes, as listed in Table 5-6. 

All bits in the Cause register, with the exception of the IP(1:0) bits, are read- 
only; IP(1:0) are used for software interrupts. 



Table 5-5 Cause Register Fields 



Field 


Description 


BD 


Indicates whether the last exception taken occurred in a branch delay slot. 
1 delay slot 
0 —> normal 


CE 


Coprocessor imit nimiber referenced when a Coprocessor Unusable 
exception is taken. 


IP 


Indicates an interrupt is pending. 
1 — > interrupt pending 
0 ^ no interrupt 


ExcCode 


Exception code field (see Table 5-6) 


0 


Reserved. Must be written as zeroes, and retiurns zeroes when read. 



no 
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Figure 5-7 Cause Register Format 



Table 5-6 Cause Register ExcCode Field 



Exception 
Code Value 


Mnemonic 


Description 


0 


Int 


Interrupt 


1 


Mod 


TLB modification exception 


2 


TLBL 


TLB exception (load or instruction fetch) 


3 


TLBS 


TLB exception (store) 


4 


AdEL 


Address error exception (load or instruction fetch) 


5 


AdES 


Address error exception (store) 


6 


IBE 


Bus error exception (instruction fetch) 


7 


DEE 


Bus error exception (data reference: load or store) 


8 


Sys 


Syscall exception 


9 


Bp 


Breakpoint exception 


10 


RI 


Reserved instruction exception 


11 


CpU 


Coprocessor Unusable exception 


12 


Ov 


Arithmetic Overflow exception 


13 


Tr 


Trap exception 


14 


VCEI 


Virtual Coherency Exception instruction 


15 


FPE 


Floating-Point exception 


16-22 




Reserved 


23 


WATCH 


Reference to WatchHi/WatchLo address 


24-30 




Reserved 


31 


VCED 


Virtual Coherency Exception data 
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Exception Program Counter (EPC) Register (14) 

The Exception Program Counter (EPC) is a read /write register that 
contains the address at which processing resumes after an exception has 
been serviced. 

For S5mchronous exceptions, the EPC register contains either: 

• the virtual address of the instruction that was the direct cause 
of the exception, or 

• the virtual address of the immediately preceding branch or 
jump instruction (when the instruction is in a branch delay 
slot, and the Branch Delay bit in the Cause register is set). 

The processor does not write to the EPC register when the EXL bit in the 
Status register is set to a 1. 

I Figure 5-8 shows the format of the EPC register. 



31 



EPC Register 



32-bit 
Mode 



EPC 



32 



63 



64-bit 
Mode 



EPC 



64 



Figure 5-8 EPC Register Format 
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WatchLo (18) and WatchHi (19) Registers 

R4000 processors provide a debugging feature to detect references to a 
selected physical address; load and store operations to the location 
specified by the WatchLo and WatchHi registers cause a Watch exception 
(described later in this chapter). 

Figure 5-9 shows the format of the WatchLo and WatchHi registers; 
Table 5-7 describes the WatchLo and WatchHi register fields. 



WatchLo Register 



PAddrO 


0 R W 1 


29 

WatchHi Register 

31 4 


1 1 1 

3 0 


0 


PAddn 1 



28 4 



Figure 5-9 WatchLo and WatchHi Register Formats 



Table 5-7 WatchHi and WatchLo Register Fields 



Field 


Description 


PAddrl 


Bits 35:32 of the physical address 


PAddrO 


Bits 31:3 of the physical address 


R 


Trap on load references if set to 1 


W 


Trap on store references if set to 1 


0 


Reserved. Must be written as zeroes, and retiims 
zeroes when read. 
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XContext Register (20) 

The read /write XContext register contains a pointer to an entry in the page 
table entry (PTE) array, an operating system data structure that stores 
virtual-to-physical address translations. When there is a TLB miss, the 
operating system software loads the TLB with the missing translation 
from the PTE array. The XContext register duplicates some of the 
information provided in the BadVAddr register, and puts it in a form useful 
for a software TLB exception handler. The XContext register is for use with 
the XTLB refill handler, which loads TLB entries for references to a 64-bit 
address space, and is included solely for operating system use. The 
operating system sets the PTE base field in the register, as needed. 
Normally, the operating system uses the Context register to address the 
current page map, which resides in the kernel-mapped segment kseg3. 
Figure 5-10 shows the format of the XContext register; Table 5-8 describes 
the XContext register fields. 



XContext Register 



63 




33 32 31 30 4 


3 0 


PTEBase 


R 


BadVPN2 


° 1 


31 


2 


27 


4 



Figure 5-10 XContext Register Format 

The 27-bit BadVPNl field has bits 39:13 of the virtual address that caused 
the TLB miss; bit 12 is excluded because a single TLB entry maps to an 
even-odd page pair. For a 4-Kby1:e page size, this format may be used 
directly to address the pair-table of 8-byte PTEs. For other page and PTE 
sizes, shifting and masking this value produces the appropriate address. 



Table 5-8 XContext Register Fields 



Field 


Description 


BadVPN2 


The Bad Virtual Page Number/ 2 field is written by hardware on a miss. It 
contains the VPN of the most recent invalidly translated virtual address. 


R 


The Region field contains bits 63:62 of the virtual address. 
OO2 = user 
OI2 = supervisor 
II2 = kernel. 


PTEBase 


The Page Table Entry Base read /write field is normally written with a value 
that allows the operating system to use the Context register as a pointer into 
the current PTE array in memory. 
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Error Checking and Correcting (ECC) Register (26) 

The 8-bit Error Checking and Correcting (ECC) register reads or writes either 
secondary-cache data ECC bits or primary-cache data parity bits for cache 
initialization, cache diagnostics, or cache error processing. (Tag ECC and 
parity are loaded from and stored to the TagLo register.) 

The ECC register is loaded by the Index Load Tag CACHE operation. 
Content of the ECC register is: 

• written into the primary data cache on store instructions 
(instead of the computed parity) when the CE bit of the Status 
register is set 

• substituted for the computed instruction parity for the CACHE 
operation Fill 

• XORed into the secondary cache computed ECC for the 
following primary data cache CACHE operations: Index Write 
Back Invalidate, Hit Write Back, and Hit Write Back Invalidate. 

Figure 5-11 shows the format of the ECC register; Table 5-9 describes the 
register fields. 



ECC Register 



31 



8 7 



0 



ECC 



24 8 

Figure 5-11 ECC Register Format 



Table 5-9 ECC Register Fields 



Field 


Description 


ECC 


An 8-bit field specifying the ECC bits read from or 
written to a secondary cache, or the even byte parity bits 
to be read from or written to a primary cache. 


0 


Reserved. Must be written as zeroes, and returns zeroes 
when read. 
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Cache Error (CacheErr) Register (27) 

The 32-bit read-only CacheErr register processes ECC errors in the 
secondary cache and parity errors in the primary cache. Parity errors 
cannot be corrected. 

All single- and double-bit ECC errors in the secondary cache tag and data 
are detected; single-bit errors in the cache tag are automatically corrected. 
Single-bit ECC errors in the secondary cache data are not automatically 
corrected. 

The CacheErr register holds cache index and status bits that indicate the 
source and nature of the error; it is loaded when a Cache Error exception 
is asserted. 

Figure 5-12 shows the format of the CacheErr register and Table 5-10 
describes the CacheErr register fields. 

CacheErr Register 
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Figure 5-12 CacheErr Register Format 



Table 5-10 CacheErr Register Fields 



Field 


Description 


ER 


T3^e of reference 

0 instruction 

1 data 


EC 


Cache level of the error 

0 primary 

1 -> secondary 


ED 


Indicates if a data field error occurred 

0 — > no error 

1 —> error 


ET 


Indicates if a tag field error occurred 

0 -> no error 

1 -> error 
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Table 5-10 (cont.) CacheErr Register Fields 


Field 


Description 


ES 


Indicates the error occurred while accessing primary or secondary cache in 
response to an external request. 

0 -> internal reference 

1 external reference 


EE 


This bit is set if the error occurred on the SysAD bus. 


EB 


This bit is set if a data error occurred in addition to the instruction error 
(indicated by the remainder of the bits). If so, this requires flushing the 
data cache after fixing the instruction error. 


EI 


This bit is set on a secondary data cache ECC error while refilling the 
primary cache on a store miss. The ECC handler must first do an Index 
Store Tag to in\'alidate the incorrect data from the primary data cache. 


EW 


This bit is only available on the R4400 processor. It is set on an 
multiprocessor cache error when the CacheErr register is already holding 
the values of a previous cache error. This bit could be set by the processor 
from the time the CacheErr register is loaded due to an error tmtil the time 
that an ERET instruction is executed. Once the EW bit is set, it can only be 
cleared by a reset. The following errors set the EPVbit: 

• Secondary cache tag errors arising from an external request 
(multibit errors only) 

• Secondary cache data errors arising from an external update 

• Primary cache tag errors arising from an external request 


SIdx 


Bits pAddr(21:3) of the reference that encountered the error (which is not 
necessarily the same as the address of the doubleword in error, but is 
sufficient to locate that doubleword in the secondary cache). 


PIdx 


Bits vAddr(14:12) of the doubleword in error (used with Sldx to construct 
a virtual index for the primary caches). 


0 


Reserved. Must be written as zeroes, and returns zeroes when read. 
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Error Exception Program Counter (Error EPC) Register (30) 

The ErrorEPC register is similar to the EPC register, except that ErrorEPC 
is used on ECC and parity error exceptions. It is also used to store the 
program counter (PC) on Reset, Soft Reset, and nonmaskable interrupt 
(NMI) exceptions. 

The read/ write ErrorEPC register contains the virtual address at which 
instruction processing can resume after servicing an error. This address 
can be: 

• the virtual address of the instruction that caused the exception 

• the virtual address of the immediately preceding branch or 
jump instruction, when this address is in a branch delay slot. 

There is no branch delay slot indication for the ErrorEPC register. 

Figure 5-13 shows the format of the ErrorEPC register. 



31 



63 



ErrorEPC Register 



ModS ErrorEPC 



32 



64-bit ErrorEPC 
Mode 



64 

Figure 5-13 ErrorEPC Register Format 
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5.3 Processor Exceptions 

This section describes the processor exceptions — it describes the cause of 
each exception, its processing by the hardware, and servicing by a handler 
(software). The types of exception, with exception processing operations, 
are described in the next section. 

Exception Types 

This section gives sample exception handler operations for the following 
exception types: 

• reset 

• soft reset 

• nonmaskable interrupt (NMI) 

• cache error 

• remaining processor exceptions 

When the EXL bit in the Status register is 0, either User, Supervisor, or 
Kernel operating mode is specified by the XS LI bits in the Status register. 
When the EXL bit is a 1, the processor is in Kernel mode. 

When the processor takes an exception, the EXL bit is set to 1, which means 
the system is in Kernel mode. After saving the appropriate state, the 
exception handler typically changes KSU to Kernel mode and resets the 
EXL bit back to 0. When restoring the state and restarting, the handler 
restores the previous value of the KS U field and sets the EXL bit back to 1 . 

Returning from an exception, also resets the EXL bit to 0 (see the ERET 
instruction in Appendix A). 

In the following sections, sample hardware processes for various 
exceptions are shown, together with the servicing required by the handler 
(software). 
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Reset Exception Process 

Figure 5-14 shows the Reset exception process. 



T: undefined 

Random ^ TLBENTRIES-1 
Wired <r- 0 

Config <- CM || EC || EP || SB || SS || SW || EW || SC || SM || BE || EM || EB || 0 || IC 
II DC II undefined^ 

ErrorEPC <- RestartPC /* If the instruction is in a branch delay slot, RestartPC 7 

/* holds the value of PC-4, othenwise RestartPC = PC 7 

If R4400 then 

CacheErr <- undefined^ || 0 || undefined^^ /* Set EW bit to 0 7 

endif 

SR^SR3i:23l|1 ||0||0||SRi9:3||1 || SRi^q 
I PC <- OxFFFF FFFF BFCO 0000 



I Figure 5-14 Reset Exception Processing 

Cache Error Exception Process 

Figure 5-15 shows the Cache Error exception process. 



T: ErrorEPC <- RestartPC /* If the instruction is in a branch delay slot, RestartPC 7 

/* holds the value of PC-4, otherwise RestartPC = PC 7 

if R4000 then 

CacheErr ^ ER || EC || ED || ET || ES || EE || EB || El || 0^ || SIdx || PIdx 
else /* R4400 7 

CacheErr <- ER || EC || ED || ET || ES || EE || EB || El || EW || 0 || SIdx || PIdx 

endif 

SR^SR3i:3||1 ||SRi;o 

I if SR22 = 1 then 
PC <r- OxFFFF FFFF BFCO 0200 + 0x100 
else 

I PC <r- OxFFFF FFFF AOOO 0000 + 0x1 00 

endif 



Figure 5-15 Cache Error Exception Processing 
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Soft Reset and NMI Exception Process 

Figure 5-16 shows the Soft Reset and NMI exception process. 



T: ErrorEPC <- RestartPC /* If the instruction is in a branch delay slot, RestartPC 7 

/* holds the value of PC-4, otherwise RestartPC = PC 7 

SR^SR3i:23lM l|0||1 ||SRi9:3|| 1 ||SRi:o 
If R4400 then 

CacheErr<- CacheErr3i -24 || 0 || CacheErr22:o 

endif 

PC <r- OxFFFF FFFF BFCO 0000 



Figure 5-16 Soft Reset and NMI Exception Processing 

General Exception Process 

Figure 5-17 shows the process used for exceptions other than Reset, Soft 
Reset, NMI, and Cache Error. 



T: if SRi = 0 then /* if not EXL V 

EPC <- RestartPC /* If the instruction is in a branch delay slot, 7 

/* RestartPC holds the value of PC-4, 7 
/* otherwise RestartPC = PC 7 

Cause ^ BD II 0 || CE || 0^^ || Causeig-g || 0 || ExcCode || 0^ 

if TLBrefill then vector <- 0x000 

elseif XTLBrefill then vector ^ 0x080 

else /* not a miss 7 vector ^0x180 

else 

Cause <- Cause3i || 0 || CE || 0^^ || Causei5:8 || 0 || ExcCode || 0^ 
vector <r- 0x180 

endif 

SR^SR31.2|| 1 IISRo /* EXL 7 
if SR22 = 1 then 

PC <- OxFFFF FFFF BFCO 0200 + vector 
else 

PC ^ OxFFFF FFFF 8000 0000 + vector 
endif 



Figure 5-17 General Exception Processing (Except Reset, Soft Reset, NMI, and Cache Error) 
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Exception Vector Locations 

The Reset, Soft Reset, and NMI exceptions are always vectored to the 
dedicated Reset exception vector at an uncached and unmapped address. 
Addresses for all other exceptions are a combination of a vector offset and 
a base address. 

The boot-time vectors (when BEV =lm the Status register) are at 
uncached and unmapped addresses. During normal operation (when 
BEV - 0) the regular exceptions have vectors in cached address spaces; 
Cache Error is always at an imcached address so that cache error handling 
can bjrpass a suspect cache. 

Table 5-11 shows the 64-bit-mode vector base address for all exceptions; 
the 32-bit mode address is the low-order 32 bits (for instance, the base 
address for NMI in 32-bit mode is OxBFCO 0000). 

Table 5-12 shows the vector offset added to the base address to create the 
exception address. 



Table 5-11 Exception Vector Base Addresses 



Exception 


BEV 


0 


1 


Cache Error 


OxFFFF FFFF AOOO 0000 


OxFFFF FFFF BFCO 0200 


Others 


OxFFFF FFFF 8000 0000 


OxFFFF FFFF BFCO 0200 


Reset, NMI, 
Soft Reset 


OxFFFF FFFF BFCO 0000 



Table 5-12 Exception Vector Offsets 



Exception 


R4000 Processor Vector Offset 


TLB refill, EXL = 0 


0x000 


XTLB refill, EXL = 0 (X = 64-bit TLB) 


0x080 


Cache Error 


0x100 


Others 


0x180 


Reset, Soft Reset, NMI 


none 
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Priority of Exceptions 

The remainder of this chapter describes exceptions in the order of their 
priority shown in Table 5-13 with (certain of the exceptions, such as the 
TLB exceptions and Instruction/Data exceptions, grouped together for 
convenience). While more than one exception can occur for a single 
instruction, only the exception with the highest priority is reported. 

Table 5-13 Exception Priority Order 

Reset (highest priority) 
I Soft Reset caused by Reset* signal 

I Nonmaskable Interrupt (NMl) (Soft Reset exception caused by NMI) 

Address error — Instruction fetch 
TLB refill — Instruction fetch 
TLB invalid — Instruction fetch 
Cache error — Instruction fetch 
Virtual Coherency — Instruction fetch 
Bus error — Instruction fetch 

Integer overflow. Trap, System Call, Breakpoint, Reserved 
Instruction, Coprocessor Unusable, or Floating-Point Exception 

Address error — Data access 

TLB refill — Data access 

TLB invalid — Data access 

TLB modified — Data write 

Cache error — Data access 

Watch 

Virtual Coherency — Data access 
Bus error — Data access 
Interrupt (lowest priority) 



Generally speaking, the exceptions described in the following sections are 
handled ("processed") by hardware; these exceptions are then serviced by 
software. 
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Reset Exception 
Cause 

The Reset exception occurs when the ColdReset*^ signal is asserted and 
then deasserted. This exception is not maskable. 

Processing 

The CPU provides a special interrupt vector for this exception: 

• location OxBFCO 0000 in 32-bit mode 

• location OxFFFF FFFF BFCO 0000 in 64-bit mode 

The Reset vector resides in unmapped and uncached CPU address space, 
so the hardware need not initialize the TLB or the cache to process this 
exception. It also means the processor can fetch and execute instructions 
while the caches and virtual memory are in an undefined state. 

The contents of all registers in the CPU are undefined when this exception 
occurs, except for the following register fields: 

• In the Status register, SR and TS are cleared to 0, and ERL and 
BEV are set to 1. All other bits are undefined. 

• Config register is initialized with the boot mode bits read from 
the serial input (see Figure 5-14). 

• The Random register is initialized to the value of its upper 
bound. 

• The Wired register is initialized to 0. 

• The EW bit in the CacheErr register is cleared (R4400 only). 
Reset exception processing is shown in Figure 5-14. 

Servicing 

The Reset exception is serviced by: 

• initializing all processor registers, coprocessor registers, caches, 
and the memory system 

• performing diagnostic tests 

• bootstrapping the operating system 



t In the following sections — indeed, throughout this book — a signal followed by an asterisk, 
such as Reset*, is low active. 



124 



MIPS R4000 Microprocessor User's Manual 



CPU Exception Processing 



Soft Reset Exception 
Cause 

The Soft Reset exception occurs in response to either the Reset* input 
signal or a Nonmaskable Interrupt (NMI)^. 

The NMI is caused either by an assertion of the NMI* signal or an external 
write to the Int*[6] bit of the Interrupt register. 

This exception is not maskable. 
Processing 

Regardless of the cause, when this exception occurs the SR bit of the Status 
register is set, distinguishing this exception from a Reset exception. 

The processor does not indicate any distinction between an exception 
caused by the Reset* signal or the NMI* signal. 

• An exception caused by an NMI can only be taken if the 
processor is processing instructions; it is taken at the 
instruction boundary. It does not abort any state machines, 
preserving the state of the processor for diagnosis. 

• An exception caused by assertion of Reset* performs a subset 
of the full reset initialization. After a processor is completely 
initialized by a Reset exception (caused by ColdReset* or 
Power-On), Reset* can be asserted on the processor in any 
state, even if the processor is no longer processing instructions. 
In this situation the processor does not read or set processor 
configuration parameters. It does, however, initialize all other 
processor state that requires hardware initialization (for 
instance, the state machines and registers), in order that the 
CPU can fetch and execute the Reset exception handler located 
in uncached and unmapped space. Although no other 
processor state is unnecessarily changed, a soft reset sequence 
may be forced to alter some state since the exception can be 
invoked arbitrarily on a cycle boundary, and abort any 
multicycle operation in progress. Since bus, cache, or other 
operations may be interrupted, portions of the cache, memory, 
or other processor state may be inconsistent. 



t In this book, a Soft Reset exception caused by assertion of the Reset* signal is referred to 
as a "soft reset" or "warm reset." A Soft Reset exception caused by a nonmaskable 
interrupt (NMI) is referred to as a "nonmaskable interrupt exception." 
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In both the Reset* and NMI cases the processor jumps to the Reset 
exception vector located in unmapped and imcached address space, so 
that the cache and TLB contents need not be initialized to service this 
exception. T5rpically, the Reset exception vector is located in PROM, and 
system memory does not need to be initialized to handle the exception. 

As previously noted, state machines interrupted by Reset* may cause 
some register contents to be inconsistent with the other processor state. 
Otherwise, on an exception caused by Reset* or NMI the contents of all 
registers are preserved, except for: 

• EW bit in the CacheErr register, which is reset to 0 (R4400 only) 

• ErrorEPC register, which contains the restart PC 

• ERL bit of the Status register, which is set to 1 

• SR bit of the Status register, which is set to 1 

• BEV bit of the Status register, which is set to 1 

• TS bit of the Status register, which is set to 0 

• PC is set to the reset vector OxFFFF FFFF BFCO 0000 
Soft reset exception processing is shown in Figure 5-16. 

Servicing 

The exception initiated by Reset* is intended to quickly reinitialize a 
previously operating processor after a fatal error such as a Master/ 
Checker mismatch. The NMI can be used for purposes other than resetting 
the processor while preserving cache and memory contents. For example, 
the system might use an NMI to cause an immediate, controlled shutdown 
when it detects an impending power failure. 

The exceptions due to Reset* and NMI appear identical to software; both 
exceptions jvmp to the Reset exception vector and have the Status register 
SR bit set. Unless external hardware provides a way to distinguish 
between the two, they are serviced by saving the current user-visible 
processor state for diagnostic purposes and reinitializing as for the Reset 
exception. It is not normally possible to continue program execution after 
returning from this exception, since a Reset* signal can be accepted 
anytime and an NMI can occur in the midst of another error exception. 
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Address Error Exception 
Cause 

The Address Error exception ocoirs when an attempt is made to execute 
one of the following: 

• load or store a doubleword that is not aligned on a doubleword 

boundary 

• load, fetch, or store a word that is not aligned on a word 
boundary 

• load or store a halfword that is not aligned on a halfword 
boundary 

• reference the kernel address space from User or Supervisor 

mode 

• reference the supervisor address space from User mode 
This exception is not maskable. 

Processing 

The common exception vector is used for this exception. The AdEL or 
AdES code in the Cause register is set, indicating whether the instruction 
caused the exception with an instruction reference, load operation, or store 
operation shown by the EPC register and BD bit in the Cause register. 

When this exception occurs, the BadVAddr register retains the virtual 
I address that was not properly aligned or that referenced protected 

address space. The contents of the VPN field of the Context and EntryHi 
registers are imdefined, as are the contents of the EntryLo register. 

The EPC register contains the address of the instruction that caused the 
exception, imless this instruction is in a branch delay slot. If it is in a 

branch delay slot, the EPC register contains the address of the preceding 
branch instruction and the BD bit of the Cause register is set as indication. 

Address Error exception processing is shown in Figure 5-17. 
Servicing 

The process executing at the time is handed a UNIX SIGSEGV 
(segmentation violation) signal. This error is usually fatal to the process 
incurring the exception. 
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TLB Exceptions 

Three types of TLB exceptions can occur: 

• TLB Refill occurs when there is no TLB entry that matches an 
attempted reference to a mapped address space. 

• TLB Invalid occurs when a virtual address reference matches a 
TLB entry that is marked invalid. 

• TLB Modified occurs when a store operation virtual address 
reference to memory matches a TLB entry which is marked 
valid but is not dirty (the entry is not writable). 

The following three sections describe these TLB exceptions. 
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TLB Refill Exception 
Cause 

The TLB refill exception occurs when there is no TLB entry to match a 
reference to a mapped address space. This exception is not maskable. 

Processing 

There are two special exception vectors for this exception; one for 
references to 32-bit address spaces, and one for references to 64-bit address 
spaces. The UX, SX, and KX bits of the Status register determine whether 
the user, supervisor or kernel address spaces referenced are 32-bit or 64- 
bit spaces. All references use these vectors when the EXL bit is set to 0 in 
the Status register. This exception sets the TLBL or TLBS code in the 
ExcCode field of the Cause register. This code indicates whether the 
instruction, as shown by the EPC register and the ED bit in the Cause 
register, caused the miss by an instiuction reference, load operation, or 
store operation. 

When this exception occurs, the BadVAddr, Context, XContext and EntryHi 
registers hold the virtual address that failed address translation. The 
EntryHi register also contains the ASID from which the translation fault 
occurred. The Random register normally contains a valid location in which 
to place the replacement TLB entry. The contents of the EntryLo register 
are xmdefined. The EPC register contains the address of the instruction 
that caused the exception, unless this instruction is in a branch delay slot, 
in which case the EPC register contains the address of the preceding 
branch instruction and the BD bit of the Cause register is set. 

TLB Refill exception processing is shown in Figure 5-17. 
Servicing 

To service this exception, the contents of the Context or XContext register 
are used as a virtual address to fetch memory locations containing the 
physical page frame and access control bits for a pair of TLB entries. The 
two entries are placed into the EntryLoO/EntryLol register; the EntryHi and 
EntryLo registers are written into the TLB. 

It is possible that the virtual address used to obtain the physical address 
and access control information is on a page that is not resident in the TLB. 
This condition is processed by allowing a TLB refill exception in the TLB 
refill handler. This second exception goes to the common exception vector 
because the EXL bit of the Status register is set. 
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TLB Invalid Exception 
Cause 

The TLB invalid exception occurs when a virtual address reference 
matches a TLB entry that is marked invalid (TLB valid bit cleared). This 
exception is not maskable. 

Processing 

The common exception vector is used for this exception. The TLBL or 
TLBS code in the ExcCode field of the Cause register is set. This indicates 
whether the instruction, as shown by the EPC register and BD bit in the 
Cause register, caused the miss by an instruction reference, load operation, 
or store operation. 

When this exception occurs, the BadVAddr, Context, XContext and EntryHi 
registers contain the virtual address that failed address translation. The 
EntryHi register also contains the ASID from which the translation fault 
occurred. The Random register normally contains a valid location in which 
to put the replacement TLB entry. The contents of the EntryLo register are 
undefined. 

The EPC register contains the address of the instruction that caused the 
exception imless this instruction is in a branch delay slot, in which case the 
EPC register contains the address of the preceding branch instruction and 
the BD bit of the Cause register is set. 

TLB Invalid exception processing is shown in Figure 5-17. 
Servicing 

A TLB entry is t5rpically marked invalid when one of the following is true: 

• a virtual address does not exist 

• the virtual address exists, but is not in main memory (a page 
fault) 

• a trap is desired on any reference to the page (for example, to 

maintain a reference bit) 

After servicing the cause of a TLB Invalid exception, the TLB entry is 
located with TLBP (TLB Probe), and replaced by an entry with that entry's 
Valid bit set. 
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TLB Modified Exception 
Cause 

The TLB modified exception occurs when a store operation virtual address 
reference to memory matches a TLB entry that is marked valid but is not 
dirty and therefore is not writable. This exception is not maskable. 

Processing 

The common exception vector is used for this exception, and the Mod code 
in the Cause register is set. 

When this exception occurs, the BadVAddr, Context, XContext and EntryHi 
registers contain the virtual address that failed address translation. The 
EntryHi register also contains the ASID from which the translation fault 
occurred. The contents of the EntryLo register are undefined. 

The EPC register contains the address of the instruction that caused the 
exception unless that instruction is in a branch delay slot, in which case the 
EPC register contains the address of the preceding branch instruction and 
the BD bit of the Cause register is set. 

TLB Modified exception processing is shown in Figure 5-17. 
Servicing 

The kernel uses the failed virtual address or virtual page number to 
identify the corresponding access control information. The page 
identified may or may not permit write accesses; if writes are not 
permitted, a write protection violation occurs. 

If write accesses are permitted, the page frame is marked dirty/writable 
by the kernel in its own data structures. The TLBP instruction places the 
index of the TLB entry that must be altered into the Index register. The 
EntryLo register is loaded with a word containing the physical page frame 
and access control bits (with the D bit set), and the EntryHi and EntryLo 
registers are written into the TLB. 
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Cache Error Exception 
Cause 

The Cache Error exception occiirs when either a secondary cache ECC 
error, primary cache parity error, or SysAD bus parity /ECC error 
condition occurs and error detection is enabled. This exception is not 
maskable, but error detection can be disabled if either ERL or DE = 1 in the 
Status register. 

Processing 

The processor sets the ERL bit in the Status register, saves the exception 
restart address in the ErrorEPC register, records information about the 
error in the CacheErr register, and then transfers to a special vector that is 
always in uncached space (Tables 5-11 and 5-12). No other registers are 
changed. Cache Error exception processing is shown in Figure 5-15. 

Servicing 

Unlike other exception conditions, cache errors cannot be avoided while 
operating at exception level, so Cache Error exceptions must be handled 
from exception level. Any general register used by the handler must be 
saved before use and restored before return; this includes the registers 
available to regiilar exception handlers without save/ restore. When 
ERL=1 in the Status register, the user address region becomes a 2^^-byte 
uncached space mapped directly to physical addresses, allowing the 
Cache Error handler to save registers to memory without using a register 
to construct the address. The handler can save and restore registers using 
operating system-reserved locations in low physical memory by using RO 
as the base register for load and store instructions. All errors should be 
logged. To correct single-bit ECC errors in the secondary cache, the 
system uses the CACHE instruction. Execution then resumes through an 
ERET instruction. To correct cache parity errors and non-single-bit ECC 
errors in unmodified cache blocks, the system uses the CACHE instruction 
to invalidate the cache block, overwrites the old data through a cache miss, 
and resumes execution with an ERET. Other errors are not correctable and 
are likely to be fatal to the current process. The exception handler cannot 
be interrupted by another Cache Error exception because error detection 
is disabled while ERL = 1, so the handler should avoid actions which 
might cause an unnoticed cache error. The R4400 (but not R4000) 
implements the EW bit in the CacheErr register to record a nonrecoverable 
error occurring while ERL = 1. 
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Virtual Coherency Exception 
Cause 

A Virtual Coherenqr exception ocoirs when all of the following conditions 
are true: 

• a primary cache miss hits in the secondary cache 

• bits 14:12 of the virtual address were not equal to the 
corresponding bits of the PIdx field of the secondary cache tag 

• the cache algorithm for the page (from the C field in the TLB) 
specifies that the page is cached 

This exception is not maskable. 
Processing 

The common exception vector is used for this exception. 

The VCEI or VCED code in the Cause register is set for instruction and data 

cache misses respectively. 

The BadVAddr register holds the virtual address that caused the exception. 
Virtual Coherency exception processing is shown in Figure 5-17. 

Servicing 

Using the appropriate CACHE instruction(s), the primary cache line at 
both the previous and the new virtual index should be invalidated^ (and 
written back, if necessary), and the PIDx field of the secondary cache 
should be written with the new virtual index. Once completed, the 
program continues. 

Software can avoid the cost of this exception by using consistent virtual 
primary cache indexes to access the same physical data. 



t When a cache miss occurs, the processor refills the primary cache line at the present virtual 
index before taking an exception. 
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Bus Error Exception 
Cause 

A Bus Error exception is raised by board-level circuitry for events such as 
bus time-out, backplane bus parity errors, and invalid physical memory 
addresses or access types. This exception is not maskable. 

A Bus Error exception occurs either when the SysCmd(5) bit indicates the 
data is erroneous (see Chapter 12) or the IvdErr* signal is asserted 
(Chapter 12). This can only occur when a cache miss refill, uncached 
reference, or an imbuffered write occurs synchronously; a Bus Error 
exception resulting from a buffered write transaction must be reported 
using the general interrupt mechanism. 

Processing 

The common interrupt vector is used for a Bus Error exception. The IBE 
or DBE code in the ExcCode field of the Cause register is set, signifying 
whether the instruction (as indicated by the EPC register and ED bit in the 
Cause register) caused the exception by an instruction reference, load 
operation, or store operation. 

The EPC register contains the address of the instruction that caused the 
exception, imless it is in a branch delay slot, in which case the EPC register 
contains the address of the preceding branch instruction and the BD bit of 
the Cause register is set. Bus Error processing is shown in Figure 5-17. 

Servicing 

The physical address at which the fault occurred can be computed from 
information available in the CPO registers. 

• If the IBE code in the Cause register is set (indicating an 
instruction fetch reference), the virtual address is contained in 

the EPC register. 

• If the DBE code is set (indicating a load or store reference), the 
instruction that caused the exception is located at the virtual 

address contained in the EPC register (or 4+ the contents of the 
EPC register if the BD bit of the Cause register is set). 

The virtual address of the load and store reference can then be obtained by 
interpreting the instruction. The physical address can be obtained by 
using the TLBP instruction and reading the EntryLo register to compute 
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the physical page number. The process executing at the time of this 
exception is handed a UNIX SIGBUS (bus error) signal, which is usually 
fatal. 

Integer Overflow Exception 
Cause 

An Integer Overflow exception occurs when an ADD, ADDI, SUB, DADD, 
DADDI or DSUB^ instruction results in a 2's complement overflow. This 
exception is not maskable. 

Processing 

The common exception vector is used for this exception, and the OV code 
in the Cause register is set. 

The EPC register contains the address of the instruction that caused the 
exception unless the instruction is in a branch delay slot, in which case the 
EPC register contains the address of the preceding branch instruction and 
the BD bit of the Cause register is set. 

Integer Overflow exception processing is shown in Figure 5-17. 
Servicing 

The process executing at the time of the exception is handed a UNIX 
SIGFPE/FPE_INTOVF_TRAP (floating-point exception/integer 
overflow) signal. This error is usually fatal to the current process. 



t See Appendix A for a description of these instructions. 
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Trap Exception 
Cause 

The Trap exception occiirs when a TGE, TGEU, TLX, TLTU, TEQ, TNE, 
TGEI, TGEUI, TLTI, TLTUI, TEQI, or TNEI''' instruction residts in a TRUE 
condition. This exception is not maskable. 

Processing 

The common exception vector is used for this exception, and the Tr code 
in the Cause register is set. 

The EPC register contains the address of the instruction causing the 

exception unless the instruction is in a branch delay slot, in which case the 
EPC register contains the address of the preceding branch instruction and 
the BD bit of the Cause register is set. 

Trap exception processing is shown in Figure 5-17. 
Servicing 

The process executing at the time of a Trap exception is handed a UNIX 
SIGFPE/FPE_lNTOVF_TRAP (floating-point exception/integer 
overflow) signal. This error is usually fatal. 
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System Call Exception 
Cause 

A System Call exception occurs diiring an attempt to execute the 
SYSCALL instruction. This exception is not maskable. 

Processing 

The common exception vector is used for this exception, and the Sys code 
in the Cause register is set. 

The EPC register contains the address of the SYSCALL instruction unless 
it is in a branch delay slot, in which case the EPC register contains the 
address of the preceding branch instruction. 

If the SYSCALL instruction is in a branch delay slot, the BD bit of the Status 
register is set; otherwise this bit is cleared. 

System Call exception processing is shown in Figure 5-17. 
Servicing 

When this exception occurs, control is tiansferred to the applicable system 
routine. 

To resume execution, the EPC register must be altered so that the 
SYSCALL instruction does not re-execute; this is accomplished by adding 
a value of 4 to the EPC register {EPC register + 4) before returning. 

If a SYSCALL instruction is in a branch delay slot, a more complicated 
algorithm, beyond the scope of this description, may be required. 
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Breakpoint Exception 
Cause 

A Breakpoint exception ocoirs when an attempt is made to execute the 
BREAK instruction. This exception is not maskable. 

Processing 

The common exception vector is used for this exception, and the BP code 
in the Cause register is set. 

The EPC register contains the address of the BREAK instruction unless it 
is in a branch delay slot, in which case the EPC register contains the 
address of the preceding branch instruction. 

If the BREAK instruction is in a branch delay slot, the BD bit of the Status 
register is set, otherwise the bit is cleared. 

Breakpoint exception processing is shown in Figure 5-17. 
Servicing 

When the Breakpoint exception occurs, control is transferred to the 
applicable system routine. Additional distinctions can be made by 
analyzing the imused bits of the BREAK instruction (bits 25:6), and 
loading the contents of the instruction whose address the EPC register 
contains. A value of 4 must be added to the contents of the EPC register 
(EPC register + 4) to locate the instruction if it resides in a branch delay 
slot. 

To resvime execution, the EPC register must be altered so that the BREAK 
instruction does not re-execute; this is accomplished by adding a value of 
4 to the EPC register (EPC register + 4) before returning. 

If a BREAK instruction is in a branch delay slot, interpretation of the 
branch instruction is required to resume execution. 



138 



MIPS R4000 Microprocessor User's Manual 



CPU Exception Processing 



Reserved Instruction Exception 
Cause 

The Reserved Instruction exception occurs when one of the following 
conditions occurs: 

• an attempt is made to execute an instruction with an undefined 

major opcode (bits 31:26) 

• an attempt is made to execute a SPECIAL instruction with an 
imdefined minor opcode (bits 5:0) 

• an attempt is made to execute a REGIMM instruction with an 
undefined minor opcode (bits 20:16) 

• an attempt is made to execute 64-bit operations in 32-bit mode 

when in User or Supervisor modes 

64-bit operations are always valid in Kernel mode regardless of the value 
of the KX bit in the Status register. 

This exception is not maskable. 

Reserved Instruction exception processing is shown in Figure 5-17. 
Processing 

The common exception vector is used for this exception, and the RI code 
in the Cause register is set. 

The EPC register contains the address of the reserved instruction imless it 
is in a branch delay slot, in which case the EPC register contains the 
address of the preceding branch instruction. 

Servicing 

No instructions in the MIPS ISA are currently interpreted. The process 
executing at the time of this exception is handed a UNIX SIGILL/ 
ILL_RESOP_FAULT (illegal instruction/ reserved operand fault) signal. 
This error is usually fatal. 
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Coprocessor Unusable Exception 
Cause 

The Coprocessor Unusable exception occiirs when an attempt is made to 
execute a coprocessor instruction for either: 

• a corresponding coprocessor unit that has not been marked 

usable, or 

• CPO instructions, when the unit has not been marked usable 
and the process executes in either User or Supervisor mode. 

This exception is not maskable. 
Processing 

The common exception vector is used for this exception, and the CPU code 
in the Cause register is set. The contents of the Coprocessor Usage Error field 
of the coprocessor Control register indicate which of the four coprocessors 
was referenced. The EPC register contains the address of the unusable 
coprocessor instruction unless it is in a branch delay slot, in which case the 
EPC register contains the address of the preceding branch instruction. 

Coprocessor Unusable exception processing is shown in Figure 5-17. 
Servicing 

The coprocessor unit to which an attempted reference was made is 
identified by the Coprocessor Usage Error field, which results in one of the 
following situations: 

• If the process is entitled access to the coprocessor, the 
coprocessor is marked usable and the corresponding user state 
is restored to the coprocessor. 

• If the process is entitled access to the coprocessor, but the 
coprocessor does not exist or has failed, interpretation of the 
coprocessor instruction is possible. 

• If the BD bit is set in the Cause register, the branch instruction 
must be interpreted; then the coprocessor instruction can be 
emulated and execution resumed with the EPC register 
advanced past the coprocessor instruction. 

• If the process is not entitled access to the coprocessor, the 
process executing at the time is handed a UNIX SIGILL/ 
ILL_PRIVIN_FAUUr (illegal instruction/privileged instruction 
fault) signal. This error is usually fatal. 
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Floating-Point Exception 
Cause 

The Floating-Point exception is used by the floating-point coprocessor. 
This exception is not maskable. 

Processing 

The common exception vector is used for this exception, and the FPE code 
in the Cause register is set. 

The contents of the Floating-Point Control/Status register indicate the cause 
of this exception. 

Floating-Point exception processing is shown in Figure 5-17. 
Servicing 

This exception is cleared by clearing the appropriate bit in the Floating- 
Point Control/Status register. 

For an unimplemented instruction exception, the kernel should emulate 
the instruction; for other exceptions, the kernel should pass the exception 
to the user program that caused the exception. 
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Watch Exception 
Cause 

A Watch exception occiirs when a load or store instruction references the 
physical address specified in the WatchLo/WatchHi System Control 
Coprocessor (CPO) registers. The WatchLo register specifies whether a 
load or store initiated this exception. 

The CACHE instruction never causes a Watch exception. 

The Watch exception is postponed if the EXL bit is set in the Status register, 
and Watch is only maskable by setting the EXL bit in the Status register. 

Processing 

The common exception vector is used for this exception, and the Watch 
code in the Cause register is set. 

Watch exception processing is shown in Figure 5-17. 
Servicing 

The Watch exception is a debugging aid; typically the exception handler 
transfers control to a debugger, allowing the user to examine the situation. 

To continue, the Watch exception must be disabled to execute the faulting 
instruction. The Watch exception must then be reenabled. The faulting 
instruction can be executed either by interpretation or by setting 
breakpoints. 
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Interrupt Exception 
Cause 

The Interrupt exception occiirs when one of the eight interrupt conditions 
is asserted. The significance of these interrupts is dependent upon the 
specific system implementation. 

Each of the eight interrupts can be masked by clearing the corresponding 
bit in the Int-Mask field of the Status register, and all of the eight interrupts 
can be masked at once by clearing the IE bit of the Status register. 

Processing 

The common exception vector is used for this exception, and the Int code 

in the Cause register is set. 

The IP field of the Cause register indicates current interrupt requests. It is 
possible that more than one of the bits can be simultaneously set (or even 
no bits may be set) if the interrupt is asserted and then deasserted before 
this register is read. 

Interrupt exception processing is shown in Figure 5-17. 
Servicing 

If the interrupt is caused by one of the two software-generated exceptions 
{SWl or SWO), the interrupt condition is cleared by setting the 
corresponding Cause register bit to 0. 

If the interrupt is hardware-generated, the interrupt condition is cleared 
by correcting the condition causing the interrupt pin to be asserted. 
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5.4 Exception Handling and Servicing Flowcharts 

The remainder of this chapter contains flowcharts for the following 
exceptions and giddelines for their handlers: 

• general exceptions and their exception handler 

• TLB/XTLB miss exception and their exception handler 

• cache error exception and its handler 

• reset, soft reset and NMI exceptions, and a guideline to their 
handler. 

Generally speaking, the exceptions are handled by hardware (HW); the 
exceptions are then serviced by software (SW). 
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Exceptions other than Reset, Soft Reset, NMI, CacheError or first-level miss 
Note: Interrupts can be masked by IE or IMs 
and Watch is masked if EXll = 1 

Comments 



Set Watch Register 
Set FP Control Status Register 
EnHi <- VPN2, ASID 
Context <- VPN2 
Set Cause Register 

EXCCode, CE 

Set BadVA 



Check if exception within 
another exception 



Yes 



Cause 31 (BD) <- 1 
EPC <- (PC - 4) 





■ 




=0 (normal) 






PC <- OxFFFF FFFF 8000 0000 + 180 


(unmapped, 


cached) 



BEV 




*Watch & FP Control Status Register 
are only set if the respective exception 
occurs. 

EnHi, X/Context are set only for 
*TLB- Invalid, Modified, 
& Refill exceptions 
BadVA is set only for 
TLB- Invalid, Modified, 
Refill- and VCED/I exceptions 
Note: not set if it is a Bus Error 



No 



Cause 31 (BD) <- 0 
EPC <- PC 



EXL <- 1 Processor forced to Kernel Mode 
& interrupt disabled 



=1 (bootstrap) 




PC <- OxFFFF FFFF BFCO 0200 + 180 
(unmapped, uncached) 



J 



T 

To General Exception Servicing Guidelines 



Figure 5-18 General Exception Handler (HW) 
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Comments 



MFCO- 

X/Context 
EPC 
Status 
Cause 



MTCO - 
(Set Status Bits:) 
KSU<- 00 
EXL <- 0 
& IE=1 



Check CAUSE REG. & Jump to 
appropriate Service Code 



* Unmapped vector so TLBMod, TLBInv, 

TLB Refill exceptions not possible 

* EXL=1 so Watch, Interrupt exceptions disabled 

* OS/System to avoid ail other exceptions 

*Only CacheError, Reset, Soft Reset, NMI 
exceptions possible. 



(optional - only to enable Interrupts while keeping Kernel Mode) 



* After EXL=0, all exceptions allowed, 
(except interrupt If masked by IE or IM 
and CacheError if masked by DE) 



Reset the processor 




Optional: Check only if 2nd-level TLB miss 



Service Code 



EXL= 1 



MTCO- 
EPC 



STATUS 




< 



* ERET is not allowed In the branch delay slot of 
another Jump Instruction 

* Processor does not execute the instruction which Is 

in the ERET's branch delay slot 

* PC <- EPC; EXL <- 0 

* LLbit <- 0 



Figure 5-19 General Exception Servicing Guidelines (SW) 
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EnHi<- VPN2, ASID 
Context <- VPN2 
Set Cause Reg. 

EXCCode, CE and 

Set BadVA 



EnHi <- VPN2, ASID 
Context <- VPN2 
Set Cause Reg. 

EXCCode, CE and 
Set BadVA 




Check if exception within 
another exception 



EPC <- (PC - 4) 
Cause bit 31 (BD) <- 1 



EPC <- PC 
Cause bit 31 (BD) <- 0 

















Y ^ 


Instruction? 


\ N 












Vec. Off. = 


= 0x080 






Vec. Off. = 0x000 



Vec. Off. = 0x180 



Points to Refill Exception 



EXL <- 1 



=0 (normal) 




Points to General Exception 

Processor forced to Kernel Mode & 
interrupt disabled 



=1 (bootstrap) 



PC <- OxFFFF FFFF 8000 0000 + Vec.Off. 
(unmapped, cached) 



PC <- OxFFFF FFFF BFCO 0200 + Vec.Off. 
(unmapped, uncached) 



To TLB/XTLB Exception Servicing Guidelines 

Figure 5-20 TLB/XTLB Miss Exception Handler (HW) 
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Comments 



MFCO- 

CONTEXT 



Service Code 



* Unmapped vector so TLBMod, TLBInv, 
TLB Refill or VCEP exceptions 

not possible 

* EXL=1 so Watch, Interrupt exceptions disabled 

* OS/System to avoid all other exceptions 

•Only CacheError, Reset, Soft Reset, NMI 
exceptions possible. 



* Load the mapping of the virtual address in Context Reg. 
Move it to ENLO and Write into the TLB 

* There could be a TLB miss again during the mapping 
of the data or instruction address. The processor will 
jump to the general exception vector since the EXL is 1 . 
(Option to complete the first level refill in the general 
exception handler or ERET to the original instruction 
and take the exception again) 




* ERET is not allowed in the branch delay slot of 
another Jump Instruction 

* Processor does not execute the instruction which is 
in the ERET's branch delay slot 

* PC <- EPC; EXL <- 0 

* LLbit <- 0 



Figure 5-21 TLB/XTLB Exception Servicing Guidelines (SW) 
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Note: Can be masked/disabled by DE (SR16) bit = 1 



Yes 



c 

c 
n 
Z 



Q. 

o 

X 



UJ 
0) 

f 
o 
n 
O 




ErrEPC <- (PC - 4) 



ErrEPC <- PC 



ERL <- 1 





=0 (normal) 




=1 (bootstrap) 












PC <- OxFFFF FFFF AOOO 0000 + 1 00 
(unmapped, uncached) 




PC <- OxFFFF FFFF BFCO 0200 + 100 
(unmapped, uncached) 



5 

CO 
V) 

a> 
c 

"5 
O 

c 
o 

"E 




Comments 

* Unmapped Uncached vector so 
TLB related & Cache Error Exception not possible 

* ERL=1 so Interrupt exceptions disabled 

* OS/System to avoid all otiier exceptions 

*Only Reset, Soft Reset, Niyil 
exceptions possible. 

* ERET Is not allowed In the branch delay slot of 
another Jump Instruction 

* Processor does not execute the Instruction which is 
in the ERET's branch delay slot 

* PC <- ErrorEPC; ERL <- 0 

* LLbit <- 0 



Figure 5-22 Cache Error Exception Handling (HW) and Servicing Guidelines (SW) 
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Soft Reset or NMI Exception 



Reset Exception 



5 
X 

U) 



■a 
c 

(0 



Status: 

BEV <- 1 
TS<-0 
SR<- 1 
ERL<- 1 

CacheErr(EW) <- 0 
(R4400 only) 



Random <- TLBENTRIES - 1 
Wired <- 0 

Config <- Update(31 :6)|| Undef(5:0) 
Status: 

BEV <- 1 

TS <-0 

SR<- 0 

ERL <- 1 

CacheErr(EW) <- 0 
(R4400 only) 



PC <- OxFFFF FFFF BFCO 0000 




Note: There is no indication from the 
processor to differentiate between 
NMI & Soft Reset; 

there must be a system level indication. 



(Optional) 



Reset Service Code 



Figure 5-23 Reset, Soft Reset & NMI Exception Handling (HW) and Servicing Guidelines (SW) 
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This chapter describes the MIPS floating-point unit (FPU) features, 
including the programming model, instruction set and formats, and the 
pipeline. 

The FPU, with associated system software, fully conforms to the 
requirements of ANSI/IEEE Standard 754—1985, IEEE Standard for Binary 
Floating-Point Arithmetic. In addition, the MIPS architecture fully supports 
the recommendations of the standard and precise exceptions. 
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6.1 Overview 



The FPU operates as a coprocessor for the CPU (it is assigned coprocessor 
label CPl), and extends the CPU instruction set to perform arithmetic 
operations on floating-point values. 

Figure 6-1 illustrates the functional organization of the FPU. 



Data Cache 



FP Bypass 
Pipeline Chain 




FCU 



— 


' ' / 64 


1 ' • 




— 













™ I FPMul I FPDiv I 

FPSqrt III 

I ^/64 I ]/ 



Figure 6-1 FPU Functional Block Diagram 
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6.2 FPU Features 

This section briefly describes the operating model, the load/ store 
instruction set, and the coprocessor interface in the FPU. A more detailed 
description is given in the sections that follow. 

• Full 64-bit Operation. When the FR bit in the CPU Status 
register equals 0, the FPU is in 32-bit mode and contains thirty- 
two 32-bit registers that hold single- or, when used in pairs, 
double-precision values. When the FR bit in the CPU Status 
register equals 1, the FPU is in 64-bit mode and the registers 
are expanded to 64 bits wide. Each register can hold single- or 
double-precision values. The FPU also includes a 32-bit Control/ 
Status register that provides access to all IEEE-Standard 
exception handling capabilities. 

• Load and Store Instruction Set. Like the CPU, the FPU uses a 
load- and store-oriented instruction set, with single-cycle load 
and store operations. Floating-point operations are started in a 
single cycle and their execution overlaps other fixed-point or 
floating-point operations. 

• Tightly Coupled Coprocessor Interface. The FPU resides on- 
chip to form a tightly coupled unit with a seamless integration 

of floating-point and fixed-point instruction sets. Since each 
unit receives and executes instructions in parallel, some 
floating-point instructions can execute at the same single-cycle- 
per-instruction rate as fixed-point instructions. 
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6.3 FPU Programming Model 

This section describes the set of FPU registers and their data organization. 

The FPU registers include Floating-Point General Purpose registers (FGRs) 
and two control registers: Control/Status and Implementation/Revision. 

Floating-Point General Registers (FGRs) 

The FPU has a set of Floating-Point General Purpose registers (FGRs) that 
can be accessed in the following ways: 

• As 32 general purpose registers (32 FGRs), each of which is 32 
bits wide when the FR bit in the CPU Status register equals 0; 
or as 32 general purpose registers (32 FGRs), each of which is 
64-bits wide when FR equals 1. The CPU accesses these 
registers through move, load, and store instructions. 

• As 16 floating-point registers (see the next section for a 
description of FPRs), each of which is 64-bits wide, when the 
FR bit in the CPU Status register equals 0. The FPRs hold 
values in either single- or double-precision floating-point 
format. Each FPR corresponds to adjacently numbered FGRs 
as shown in Figure 6-2. 

• As 32 floating-point registers (see the next section for a 
description of FPRs), each of which is 64-bits wide, when the 
FR bit in the CPU Status register equals 1. The FPRs hold 
values in either single- or double-precision floating-point 
format. Each FPR corresponds to an FGR as shown in 
Figure 6-2. 
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Figure 6-2 FPU Registers 
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Floating-Point Registers 

The FPU provides: 

• 16 Floating-Point registers (FPKs) when the FR bit in the Status 
register equals 0, or 

• 32 Floating-Point registers {FPRs) when the FR bit in the Status 
register equals 1. 

These 64-bit registers hold floating-point values during floating-point 

operations and are physically formed from the General Purpose registers 
(FGRs). When the FR bit in the Status register equals 1, the f PR references 
a single 64-bit FGR. 

The FPRs hold values in either single- or double-precision floating-point 
format. If the FR bit equals 0, only even numbers (the least register, as 
shown in Figure 6-2) can be used to address FPKs. When the FR bit is set 
to a 1, all FPR register numbers are valid. 

If the FR bit equals 0 during a double-precision floating-point operation, 
the general registers are accessed in double pairs. Thus, in a double- 
precision operation, selecting Floating-Point Register 0 (FPRO) actually 
addresses adjacent Floating-Point General Purpose registers FGRO and 
FGRl. 
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Floating-Point Control Registers 

The FPU has 32 control registers (FCRs) that can only be accessed by move 
operations. The FCRs are described below: 

• The Implementation/Revision register (FCRO) holds revision 
information about the FPU. 

• The Control/Status register (FCR31) controls and monitors 
exceptions, holds the result of compare operations, and 
establishes rounding modes. 

• FCRl to FCR30 are reserved. 
Table 6-1 lists the assignments of the FCRs. 



Table 6-1 Floating-Point Control Register Assignments 



FCR Number 


Use 


FCRO 


Coprocessor implementation and revision register 


FCRl to FCR30 


Reserved 


FCR31 


Rounding mode, cause, trap enables, and flags 
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Implementation and Revision Register, (FCRO) 

The read-only Implementation and Revision register {FCRO) specifies the 
implementation and revision niraiber of the FPU. This information can 
determine the coprocessor revision and performance level, amd can also be 
used by diagnostic software. 

Figure 6-3 shows the layout of the register; Table 6-2 describes the 
Implementation and Revision register (FCRO) fields. 



31 



Implementation/Revision Register (FCRO) 

16 15 8 7 



0 



Imp 
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16 



Figure 6-3 Implementation/Revision Register 



Table 6-2 FCRO Fields 



Field 


Description 


Imp 


Implementation number (0x05) 


Re\' 


Revision number in the form of y.x 


0 


Reserved. Must be written as zeroes, and retiims zeroes 
when read. 



The revision number is a value of the form y.x, where: 

• y is a major revision number held in bits 7:4. 

• X is a minor revision number held in bits 3:0. 

The revision number distinguishes some chip revisions; however, MIPS 
does not guarantee that changes to its chips are necessarily reflected by the 
revision number, or that changes to the revision number necessarily reflect 
real chip changes. For this reason revision number values are not listed, 
and software should not rely on the revision number to characterize the 
chip. 
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Control/Status Register (FCR31) 

The Control/ Status register (FCR31) contains control and status 
information that can be accessed by instructions in either Kernel or User 
mode. FCR31 also controls the arithmetic roimding mode and enables 
User mode traps, as well as identifying any exceptions that may have 
occurred in the most recently executed instruction, along with any 
exceptions that may have occurred without being trapped. 

Figure 6-4 shows the format of the Control/Status register, and Table 6-3 
describes the Control/Status register fields. Figure 6-5 shows the Control/ 
Status register Cause, Flag, and Enable fields. 
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Figure 6-4 FP Control/Status Register Bit Assignments 



Table 6-3 Control/Status Register Fields 



Field 


Description 


FS 


When set, denormalized results are flushed to 0 instead of causing an 
unimplemented operation exception. 


C 


Condition bit. See description of Control/Status register Condition bit. 


Cause 


Cause bits. See Figure 6-5 and the description of Control/Status register 
Cause, Flag, and Enable bits. 


Enables 


Enable bits. See Figure 6-5 and the description of Control/Status register 
Cause, Flag, and Enable bits. 


Flags 


Flag bits. See Figure 6-5 and the description of Control/Status register 
Cause, Flag, and Enable bits. 


RM 


Rounding mode bits. See Table 6-4 and the description of Control/Status 
register Rounding Mode Control bits. 
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Figure 6-5 Control/Status Register Cause, Flag, and Enable Fields 

Accessing the Control/Status Register 

When the Control/ Status register is read by a Move Control From 
Coprocessor 1 (CFCl) instruction, all unfinished instructions in the 
pipeline are completed before the contents of the register are moved to the 
main processor. If a floating-point exception occurs as the pipeline 
empties, the FP exception is taken and the CFCl instruction is re-executed 
after the exception is serviced. 

The bits in the Control/Status register can be set or cleared by writing to the 
register using a Move Control To Coprocessor 1 (CTCl) instruction. 
FCR31 must only be written to when the FPU is not actively executing 
floating-point operations; this can be ensured by reading the contents of 
the register to empty the pipeline. 
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IEEE Standard 754 

IEEE Standard 754 specifies that floating-point operations detect certain 
exceptional cases, raise flags, and can invoke an exception handler when 
an exception occurs. These features are implemented in the MIPS 
architecture with the Cause, Enable, and Flag fields of the Control/Status 
register. The Flag bits implement IEEE 754 exception status flags, and the 
Cause and Enable bits implement exception handling. 

Control/Status Register FS Bit 

When the FS bit is set, denormalized results are flushed to 0 instead of 
causing an unimplemented operation exception. 

Control/Status Register Condition Bit 

When a floating-point Compare operation takes place, the result is stored 
at bit 23, the Condition bit, to save or restore the state of the condition line. 
The C bit is set to 1 if the condition is true; the bit is cleared to 0 if the 
condition is false. Bit 23 is affected only by compare and Move Control To 
FPU instructions. 

Control/Status Register Cause, Flag, and Enable Fields 

Figure 6-5 illustrates the Cause, Flag, and Enable fields of the Control/Status 
register. 

Cause Bits 

Bits 17:12 in the Control/Status register contain Cawsebits, as shown in 
Figure 6-5, which reflect the results of the most recently executed 
instruction. The Cause bits are a logical extension of the CPO Cause register; 
they identify the exceptions raised by the last floating-point operation and 
raise an interrupt or exception if the corresponding enable bit is set. If 
more than one exception occurs on a single instruction, each appropriate 
bit is set. 

The Cause bits are written by each floating-point operation (but not by 
load, store, or move operations). The Unimplemented Operation (E) bit is 
set to a 1 if software emulation is required, otherwise it remains 0. The 
other bits are set to 0 or 1 to indicate the occurrence or non-occurrence 
(respectively) of an IEEE 754 exception. 
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When a floating-point exception is taken, no results are stored, and the 
only state affected is the Cause bit. 

Enable Bits 

A floating-point exception is generated any time a Cause bit and the 
corresponding Enable bit are set. A floating-point operation that sets an 
enabled Cause bit forces an immediate exception, as does setting both 
Cause and Enable bits with CTCl. 

There is no enable for Unimplemented Operation (E). Setting 
Unimplemented Operation always generates a floating-point exception. 

Before returning from a floating-point exception, software must first clear 
the enabled Cause bits with a CTCl instruction to prevent a repeat of the 
interrupt. Thus, User mode programs can never observe enabled Cause 
bits set; if this information is required in a User mode handler, it must be 
passed somewhere other than the Status register. 

For a floating-point operation that sets only unenabled Cause bits, no 
exception occurs and the default result defined by IEEE 754 is stored. In 
this case, the exceptions that were caused by the immediately previous 
floating-point operation can be determined by reading the Cause field. 

Flag Bits 

The Flag bits are cimiulative and indicate that an exception was raised by 
an operation that was executed since they were explicitly reset. Flag bits 
are set to 1 if an IEEE 754 exception is raised, otherwise they remain 
unchanged. The Flag bits are never cleared as a side effect of floating-point 
operations; however, they can be set or cleared by writing a new value into 
the Status register, using a Move To Coprocessor Control instruction. 

When a floating-point exception is taken, the flag bits are not set by the 
hardware; floating-point exception software is responsible for setting 
these bits before invoking a user handler. 
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Control/Status Register Rounding Mode Control Bits 

Bits 1 and 0 in the Control/Status register constitute the Rounding Mode 
{RM) field. 

As shown in Table 6-4, these bits specify the roiinding mode that the FPU 
uses for all floating-point operations. 



Table 6-4 Rounding Mode Bit Decoding 



Rounding 
Mode 
RM(1:0) 


Mnemonic 


Description 


0 


RN 


Round result to nearest representable 
value; round to value with least- 
significant bit 0 when the two nearest 
representable values are equally near. 


1 


RZ 


Round toward 0: round to value closest to 
and not greater in magnitude than the 
infinitely precise result. 


2 


RP 


Round toward +°°: round to value closest 
to and not less than the infinitely precise 
result. 


3 


RM 


Roimd toward - round to value closest 
to and not greater than the infinitely 
precise result. 
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6.4 Floating-Point Formats 

The FPU performs both 32-bit (single-precision) and 64-bit (double- 
precision) IEEE standard floating-point operations. The 32-bit single- 
precision format has a 24-bit signed-magnitude fraction field (f+s) and an 
8-bit exponent (e), as shown in Figure 6-6. 
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Figure 6-6 Single-Precision Floating-Point Format 



The 64-bit double-precision format has a 53-bit signed-magnitude fraction 
field (f+s) and an 11-bit exponent, as shown in Figure 6-7. 
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Figure 6-7 Double-Precision Floating-Point Format 



As shown in the above figures, numbers in floating-point format are 
composed of three fields: 

• sign field, s 

• biased exponent, e = E + bias 

• fraction,/ = .bib2-.--bp_-i 

The range of the unbiased exponent E includes every integer between the 
two values Eu^n and Ejj^^ inclusive, together with two other reserved 
values: 

• Ejnin -1 (to encode 0 and denormalized numbers) 

• Ejnax +1 (to encode °° and NaNs [Not a Number]) 

For single- and double-precision formats, each representable nonzero 
niraierical value has just one encoding. 
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For single- and double-precision formats, the value of a number, v, is 
determined by the equations shown in Table 6-5. 



Table 6-5 Equations for Calculating Values in Single and Double-Precision 

Floating-Point Format 



No. 


Equation 


(1) 


if E = Emax+I and 0, then v is NaN, regardless of s 


(2) 


if E = Emax+I and f = 0, then v= {-^)^°° 


(3) 


if E^in<E<E^a„ then \/= (-1)^2^(1 ./) 


(4) 


if E = Emin-1 and f o, then v= (-1)S2^'^'"(0./) 


(5) 


if E = Emin-1 and f = 0, then v= (-^fO 



For all floating-point formats, if v is NaN, the most-significant bit off 
determines whether the value is a signaling or quiet NaN: visa signaling 
NaN if the most-significant bit of/ is set, otherwise, i; is a quiet NaN. 

Table 6-6 defines the values for the format parameters; ininimirai and 
maximum floating-point values are given in Table 6-7. 



Table 6-6 Floating-Point Format Parameter Values 



Parameter 


Format 


Single 


Double 


^max 


+127 


+1023 


F ■ 


-126 


-1022 


Exponent bias 


+127 


+1023 


Exponent width in bits 


8 


11 


Integer bit 


hidden 


hidden 


f (Fraction width in bits) 


24 


53 


Format width in bits 


32 


64 
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Table 6-7 Minimum and Maximum Floating-Point Values 



Type 


Value 


Float Minimum 


1.40129846e-45 


Float Minimum Norm 


1.17549435e-38 


Float Maximum 


3.40282347e+38 


Double Minimum 


4.9406564584124654e-324 


Double Minimum Norm 


2.2250738585072014e-308 


Double Maximimi 


1.7976931348623157e+308 



6.5 Binary Fixed-Point Format 



Binary fixed-point values are held in 2's complement format. Unsigned 
fixed-point values are not directly provided by the floating-point 
instruction set. Figure 6-8 illustrates binary fixed-point format; Table 6-8 
lists the binary fixed-point format fields. 



31 30 



Sign 



Integer 



31 

Figure 6-8 Binary Fixed-Point Format 
Field assignments of the binary fixed-point format are: 
Table 6-8 Binary Fixed-Point Format Fields 



Field 


Description 


sign 


sign bit 


integer 


integer value 
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6.6 Floating-Point Instruction Set Overview 

All FPU instructions are 32 bits long, aligned on a word boundary. They 
can be divided into the following groups: 

• Load, Store, and Move instructions move data between 
memory, the main processor, and the FPU General Purpose 
registers. 

• Conversion instructions perform conversion operations 

between the various data formats. 

• Computational instructions perform arithmetic operations on 
floating-point values in the FPU registers. 

• Compare instructions perform comparisons of the contents of 
registers and set a conditional bit based on the results. 

• Branch on FPU Condition instructions perform a branch to the 
specified target if the specified coprocessor condition is met. 

In the instruction formats shown in Tables 6-9 through 6-12, the fmt 
appended to the instruction opcode specifies the data format: S specifies 
single-precision binary floating-point, D specifies double-precision binary 
floating-point, W specifies 32-bit binary fixed-point, and L specifies 64-bit 
(long) binary fixed-point. 



Table 6-9 FPU Instruction Summary: Load, Move and Store Instructions 



OpCode 


Description 


LWCl 


Load Word to FPU 


SWCl 


Store Word from FPU 


LDCl 


Load Doubleword to FPU 


SDCl 


Store Doubleword From FPU 


MTCl 


Move Word To FPU 


MFCl 


Mo\'e Word From FPU 


CTCl 


Move Control Word To FPU 


CFCl 


Move Control Word From FPU 


DMTCl 


Doubleword Move To FPU 


DMFCl 


Doubleword Move From FPU 
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Table 6-10 FPU Instruction Summary: Conversion Instructions 



Opcode 


Description 


CVT.S.fmt 


Floating-point Convert to Single FP 


CVT.D.fmt 


Floating-point Convert to Double FP 


CVT.W.fmt 


Floating-point Convert to 32-bit Fixed Point 


CVT.L.fmt 


Floating-point Convert to 64-bit Fixed Point 


ROUND.W.fmt 


Floating-point Round to 32-bit Fixed Point 


ROUND.L.fmt 


Floating-point Round to 64-bit Fixed Point 


TRUNC.W.fmt 


Floating-point Truncate to 32-bit Fixed Point 


TRUNC.L.fmt 


Floating-point Truncate to 64-bit Fixed Point 


CEIL.W.fmt 


Floating-point Ceiling to 32-bit Fixed Point 


CEIL.L.fmt 


Floating-point Ceiling to 64-bit Fixed Point 


FLOOR.W.fmt 


Floating-point Floor to 32-bit Fixed Point 


FLOOR.L.fmt 


Floating-point Floor to 64-bit Fixed Point 


Table 6-11 FPU Instruction Summary: Computational Instructions 


OpCode 


Description 


ADD.fmt 


Floating-point Add 


SUB.fmt 


Floating-point Subtract 


MUL.fmt 


Floating-point Multiply 


DIV.fmt 


Floating-point Divide 


ABS.fmt 


Floating-point Absolute Value 


MOV.fmt 


Floating-point Move 


NEG.fmt 


Floating-point Negate 


SQRT.fmt 


Floating-point Square Root 


Table 6-12 FPU Instruction Summary: Compare and Branch Instructions 


OpCode 


Description 


C.cond.fmt 


Floating-point Compare 


BCIT 


Branch on FPU True 


BCIF 


Branch on FPU False 


BCITL 


Branch on FPU True Likely 


BCIFL 


Branch on FPU False Likely 
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Floating-Point Load, Store, and Move Instructions 

This section discusses the manner in which the FPU uses the load, store 
and move instructions listed in Table 6-9; Appendix B provides a detailed 
description of each instiuction. 

Transfers Between FPU and Memory 

All data movement between the FPU and memory is accomplished by 
using one of the following instructions: 

• Load Word To Coprocessor 1 (LWCl) or Store Word From 
Coprocessor 1 (SWCl) instructions, which reference a single 
32-bit word of the FPU general registers 

• Load Doubleword (LDCl) or Store Doubleword (SDCl) 
instructions, which reference a 64-bit doubleword. 

These load and store operations are imformatted; no format conversions 
are performed and therefore no floating-point exceptions can occur due to 
these operations. 

Transfers Between FPU and CPU 

Data can also be moved directly between the FPU and the CPU by using 
one of the following instructions: 

• Move To Coprocessor 1 (MTCl) 

• Move From Coprocessor 1 (MFCl) 

• Doubleword Move To Coprocessor 1 (DMTCl) 

• Doubleword Move From Coprocessor 1 (DMFCl) 

Like the floating-point load and store operations, these operations 
perform no format conversions and never cause floating-point exceptions. 

Load Delay and Hardware Interlocks 

The instiuction immediately following a load can use the contents of the 
loaded register. In such cases the hardware interlocks, requiring 
additional real cycles; for this reason, scheduling load delay slots is 
desirable, although it is not required for functional code. 
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Data Alignment 

All coprocessor loads and stores reference the following aligned data 
items: 

• For word loads and stores, the access type is always WORD, 
and the low-order 2 bits of the address must always be 0. 

• For doubleword loads and stores, the access type is always 
DOUBLEWORD, and the low-order 3 bits of the address must 
always be 0. 

Endianness 

Regardless of byte-numbering order (endianness) of the data, the address 
specifies the byte that has the smallest byte address in the addressed field. 
For a big-endian system, it is the leftmost byte; for a little-endian system, 
it is the rightmost byte. 

Floating-Point Conversion Instructions 

Conversion instructions perform conversions between the various data 
formats such as single- or double-precision, fixed- or floating-point 
formats. Table 6-10 lists conversion instructions; Appendix B gives a 
detailed description of each instruction. 

Floating-Point Computational Instructions 

Computational instructions perform arithmetic operations on floating- 
point values, in registers. Table 6-11 lists the computational instructions 
and Appendix B provides a detailed description of each instruction. There 
are two categories of computational instructions: 

• 3-Operand Register-Type instructions, which perform floating- 
I point addition, subtraction, multiplication, and division 

• 2-Operand Register-Type instructions, which perform floating- 
I point absolute value, move, negate, and square root operations 

Branch on FPU Condition Instructions 

Table 6-12 lists the Branch on FPU (coprocessor unit 1) condition 
instructions that can test the result of the FPU compare (C.cond) 
instructions. Appendix B gives a detailed description of each instruction. 
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Floating-Point Compare Operations 

The floating-point compare (C.fmt.cond) instructions interpret the 
contents of two FPU registers (fs,ft) in the specified format (fmt) and 
arithmetically compare them. A result is determined based on the 
comparison and conditions (cond) specified in the instruction. 

Table 6-12 lists the compare instructions; Appendix B gives a detailed 
description of each instruction. Table 6-13 lists the mnemonics for the 
compare instruction conditions. 



Table 6-13 Mnemonics and Definitions of Compare Instruction Conditions 



Mnemonic 


Definition 


Mnemonic 


Definition 


T 


True 


F 


False 


OR 


Ordered 


UN 


Unordered 


NEQ 


Not Equal 


EQ 


Equal 


OLG 


Ordered or Less Than or 
Greater Than 


UEQ 


Unordered or Equal 


UGE 


Unordered or Greater Than 
or Equal 


OLT 


Ordered Less Than 


OGE 


Ordered Greater Than 


ULT 


Unordered or Less Than 


UGT 


Unordered or Greater Than 


OLE 


Ordered Less Than or Equal 


OGT 


Ordered Greater Than 


ULE 


Unordered or Less Than or 
Equal 


ST 


Signaling True 


SF 


Signaling False 


GLE 


Greater Than, or Less Than 
or Equal 


NGLE 


Not Greater Than or Less 
Than or Equal 


SNE 


Signaling Not Equal 


SEQ 


Signaling Equal 


GL 


Greater Than or Less Than 


NGL 


Not Greater Than or Less 
Than 


NLT 


Not Less Than 


LT 


Less Than 


GE 


Greater Than or Equal 


NGE 


Not Greater Than or Equal 


NLE 


Not Less Than or Equal 


LE 


Less Than or Equal 


GT 


Greater Than 


NGT 


Not Greater Than 
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6.7 FPU Instruction Pipeline Overview 

The FPU provides an instruction pipeline that parallels the CPU 
instruction pipeline. It shares the same eight-stage pipeline architecture 
with the CPU (see Chapter 3). 



Instruction Execution 

Figure 6-9 illustrates the 8-instruction overlap in the FPU pipeline. 
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Figure 6-9 FPU Instruction Pipeline 

Figure 6-9 assumes that one instruction is completed every PCycle. Most 
FPU instructions, however, reqmre more than one cycle in the EX stage. 
This means the FPU must stall the pipeline if an instruction execution 
cannot proceed because of register or resource conflicts. 

Figure 6-10 illustrates the effect of a three-cycle stall on the FPU pipeline. 
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Figure 6-10 FPU Pipeline Stall 



To lessen the performance impact that results from stalling the instruction 
pipeline, the FPU allows instructions to overlap so that instruction 
execution can proceed as long as there are no resource conflicts, data 
dependencies, or exception conditions. The following sections describe 
the timing and overlapping of FPU instructions. 

Instruction Execution Cycle Time 

Unlike the CPU, which executes almost all instructions in a single cycle, 
more time may be required to execute FPU instructions. 

Table 6-14 gives the minimum latency, in processor pipeline cycles, of each 
floating-point operation for the currently implemented configurations. 
These latency calculations assume the result of the operation is 
immediately used in a succeeding operation. 



MIPS R4000 Microprocessor User's Manual 



173 



Chapter 6 



Table 6-14 Floating-Point Operation Latencies 



Operation 


Pipeline Cycles 


Operation 


Pipeline Cycles 


S 


D 


W 


L 


S 


D 


W 


L 


ADD.fmt 


4 


4 


(a) 


(a) 


CVT.[W,L].fmt 


4 


4 


(a) 


(a) 


SUB.fmt 


4 


4 


(a) 


(a) 


C.fmt.cond 


3 


3 


(a) 


(a) 


MUL.fmt 


7 


8 


(a) 


(a) 


BCIT 


(b) 


1 


(b) 


(b) 


DlVimt 


23 


36 


(a) 


(a) 


BCIF 


(b) 


1 


(b) 


(b) 


SQRT.fmt 


54 


112 


(a) 


(a) 


BCITL 


(b) 


1 


(b) 


(b) 


ABS.fmt 


2 


2 


(a) 


(a) 


BCIFL 


(b) 


1 


(b) 


(b) 


MOV.fmt 


1 


1 


(a) 


(a) 


LWCl 


(b) 


3 


(b) 


(b) 


NEG.fmt 


2 


2 


(a) 


(a) 


SWCl 


(b) 


1 


(b) 


(b) 


ROUND.[W,L].fmt 


4 


4 


(a) 


(a) 


LDCl 


(b) 


3 


(b) 


(b) 


TRUNC.[W,L].fmt 


4 


4 


(a) 


(a) 


SDCl 


(b) 


1 


(b) 


(b) 


CEIL.[W,L].fmt 


4 


4 


(a) 


(a) 


MTCl 


(b) 


3 


(b) 


(b) 


FLOOR.[W,L].fmt 


4 


4 


(a) 


(a) 


MFCl 


(b) 


3 


(b) 


(b) 


CVT.S.fmt 


(a) 


4 


6 


7 


CTCl 


(b) 


3 


(b) 


(b) 


CVT.Dimt 


2 


(a) 


5 


4 


CFCl 


(b) 


2 


(b) 


(b) 



(a) These operations are illegal. 

(b) These operations are undefined. 
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Scheduling FPU Instructions 

The floating-point architecture permits the overlapping of floating-point 
load, store, and move instructions with some of the other processor 
operations. 

To permit this, the FPU coprocessor implements three separate operation 
(op) units: 

• divider 

• multiplier 

• adder (for remaining operations) 

The multiplier and divider can overlap adder operations; however, they 
use the adder on their final cycles, which imposes some limitations. 

The multiplier can begin a new double-precision multiplication every four 
cycles, and a new single-precision multiplication every three cycles. The 
adder generally begins a new operation one cycle before the previous 
cycle completes; therefore, a floating-point addition or subtraction can 
start every three cycles. 

The FPU coprocessor pipeline is fuUy bypassed and interlocked. 

FPU Pipeline Overlapping 

Each of the three op units is controlled by an FPU resource scheduler, 
which issues instructions vinder constraints described in the following 
section. Table 6-15 lists the pipe stages used in each of the op units 
(although not all stages are used by each imit). 



Table 6-15 FPU Operational Unit Pipe Stages 



Stage 


Description 


A 


FPU Adder Mantissa Add stage 


E 


FPU Adder Exception Test stage 


EX 


CPU EX stage 


M 


FPU Multiplier 1st stage 


N 


FPU Multiplier 2nd stage 


R 


FPU Adder Result Roimd stage 


S 


FPU Adder Operand Shift stage 


U 


FPU Unpack stage 
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Instruction Scheduling Constraints 

The FPU resource scheduler is kept from issuing instructions to the FPU 
op units (adder, multiplier, and divider) by the limitations in their micro- 
architectures. If any of the following constraints are violated, the op unit 
assumes the outstanding instruction in its pipe is discarded, and then 
continues operation on the most recently issued instruction. 

FPU Divider Constraints 

The FPU divider can handle only one non-overlapped division instruction 
in its pipe at any one time. 



FPU Multiplier Constraints 

The FPU multiplier allows up to two pipelined MUL.[S,D] instructions to 
be processed as long as the following constraints are met: 

• Two idle cycles are required after a MUL.S instruction (as 
shown in Figure 6-11). 

• Three idle cycles are required after MUL.D instruction (as 
shown in Figure 6-12). 

These figures are not meant to imply that back-to-back multiplications are 
allowed. Rather, as shown in Figure 6-11, instructions 12 and 13 are illegal 
and 15, 16, 17, and 18 are successive stages of 14, referenced to II. 

Figure 6-12 is similar, in that 16, 17, and 18 are successive stages of 15. 



MUL.S 11 U 


M 


M 


M 


N 


N/A 


R 












MUL.[S.D] 12 


U 


M 


M 


M 


M 


N 


N/A 


R 










MUL.rS.D] 13 


U 


M 


M 


M 


M 


N 


N/A 


R 



MUL.[S.D] 
MUL.[S.D] 
MUL.[S.D] 
MUL.[S.D] 
MUL.[S.D] 



u 


M 


M 


M 


M 


N 


N/A 


R 



U 


M 


M 


M 


M 


N 


N/A 


R 



16 



u 


M 


M 


M 


M 


N 


N/A 


R 



Legal to Issue? 

No 

No 

Yes 

Yes 

Yes 



u 


M 


M 


M 


M 


N 


N/A 


R 






















18 


U 


M 


M 


M 


M 


N 


N/A 


R 



Yes 



Figure 6-11 MUL.S Instruction Scheduling in the FPU Multiplier 
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MULD 


11 U 


MUL[S 


D] 12 


MUL[S 


D] 


MUL[S 


D] 


MUL[S 


D] 


MUL[S 


D] 


MUL[S 


D] 


MUL[S 


D] 



M 


M 


M 


M 


N 


N/A 


R 




















U 


M 


M 


M 


M 


N 


N/A 


R 



Legal to Issue? 

No 



u 


M 


M 


M 


M 


N 


N/A 


R 













14 


U 


M 


M 


M 


M 


N 


N/A 


R 


No 








15 


U 


M 


M 


M 


M 


N 


N/A 


R Yes 



U 


M 


M 


M 


M 


N 


N/A 


R 



17 



u 


M 


M 


M 


M 


N 


N/A 


R 






















18 


U 


M 


M 


M 


M 


N 


N/A 


R 



Yes 
Yes 
Yes 



Figure 6-12 MUL.D Instruction Scheduling in the FPU Multiplier 
FPU Adder Constraints 

Following are the constraints that must be met in the FPU adder op unit. 

Cycle Overlap. The adder op unit must allow a clock cycle overlap 
between each newly issued instruction and the instruction being 
completed, as shown in Figure 6-13. 



NEG.[S.D]| U 

ADD.[S,D] 

NOP 

NOP 

C.COND.[S,D] 
NOP 

SQRT.[S,D] 
NOP 

NOP 

ADD.[S,D] 



u 


S+A 


A+R 


R+S 



U 


A 


R 








U 






U 


E 


A+R 




A+R 


R 



U 


S 


A 


R 



Figure 6-13 Instruction Cycle Overlap in FPU Adder 
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1 



MULD 11 
ADD.[S,D] 



Resource Conflict. The adder must allow the cleanup stages (A, R) of a 
multiplication instruction to be pipelined with the execution of an 
ADD.[S,D], SUB.[S,D], or C.COND.[S,D] instruction, as long as no two 
instructions simultaneously attempt to use the same A and R pipe stages. 
For instance. Figure 6-14 shows a resoiirce conflict between the mantissa 
add (A, stage 7) of instructions 1, 5, and 6. This figure also shows the 
resource conflict between result roimd (R), stage 8, of instructions 1, 5, and 
6. The multiplication cleanup cycles (A, R) can neither overlap nor 
pipeline with any other instruction currently in the adder pipe. 



Figures 6-14 through 6-17 show these constraints. 



Stage* 
5 6 



10 11 



Legal to Issue? 



U M M M M 



N/A 



12 U S+A 



A+R 



R+S 



13 U S+A 



A+R 



R+S 



14 



U S+A 



A+R 



15 U S+A A+R R+S 



=i+S 



16 U 



S+A 



A+R 



R+S 



Indicates a resource conflict 



17 



u 


S+A 


A+R 


R+S 




18 


U S+A A+R R+S 



Yes 
Yes 
Yes 
No 
No 
Yes 
Yes 



Figure 6-14 MUL.D and ADD.[S,D ] Cycle Conflict in FPU Adder 



MULS 11 
ADD.[S,D] 



1 



Stage* 
5 6 



U M M M N 



12 U S+A A+R R+S 



13 



U S+A A+R 



14 U S+AA+F R+S 



N/A 



R+S 



I5| U I S+A 



A+R 



R+S 



10 11 



16 



, Indicates a resource conflict 



u 


S+A 


A+R 


R+S 




17 


U 


S+A 


A+R 


R+S 



18 



U S+A A+R R+S 



Legal to Issue? 



Yes 
Yes 
No 
No 
Yes 
Yes 
Yes 



Figure 6-15 MUL.S and ADD.[S,D] Cycle Conflict in FPU Adder 
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Stage* 
5 6 



10 



MULD 11 



u 


M 


M 


M 


M 


N 


N/A 


R 




] 12 


U 


A 


R 












13 


U 


A 


R 








14 


U 


A 


R 






15 


U 


A 


R 






16 


U 


Al 


Rl 


f 



Indicates a resource conflict 



17 



18 



U 



R 



Legal to Issue? 

Yes 
Yes 
Yes 
Not 
No 
Yes 
Yes 



+ While there is no resource conflict in issuing this CMP.[S,D] instruction, the hardware does 
not allow it. 

Figure 6-16 MUL.D and CMP.[S,D] Cleanup Cycle Conflict in FPU Adder 
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MULS 11 U 
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CMP.[S,D] 12 


U 


A 


R 












13 
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R 






14 
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15 
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Al 


Rl 






16 



, Indicates a resource conflict 



17 



A 
U 
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R 
A 



R 
A 



Legal to Issue? 

Yes 
Yes 
Not 
No 
Yes 
Yes 
Yes 



+ While there is no resource conflict in issuing this CMR[S,D] instruction, the hardware does 
not allow it. 



Figure 6-17 MUL.S and CM.P.[S,D] Cleanup Cycle Conflict in FPU Adder 
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Prep and Cleanup Cycle Overlap. The adder does not allow the 
preparation (U stage) and cleanup cycles (N, A, R) of a division instruction 
to be pipelined with any other instruction; however, the adder does allow 
the last cycle of preparation or cleanup to be overlapped one clock by the 
following instruction's U stage (the CPU EX cycle). Figure 6-18 shows this 
process. 



or 



u 


A 


R+D 


D 


D 




D 


A+D 


R+D 


A+D 


R+D 


A 


R 




u 


A 


S+R 


S+D 


D 




D 


A+D 


R+D 


A+D 


R+D 


A 


R 



NOP 



NOP 

ADD.[S,D] 
NOP 



u 


S+A 


A+R 


R+S 



NOP 


U 




CMP.[S,D] 




U 


A 


R 



Figure 6-18 Adder Prep and Cleanup Cycle Overlap 
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Instruction Latency, Repeat Rate, and Pipeline Stage Sequences 

Table 6-16 lists the latency and repeat rate between instructions, together 
with the sequence of pipeline stages for each instruction. For example, the 
latency of the ADD.[S,D] is 4, which means it takes four processor cycles 
to complete. The Repeat Rate column indicates how soon an instruction 
can be repeated; for example, an ADD.[S,D] can be repeated after the 
conclusion of the third pipeline stage. 



Table 6-16 Latency, Repeat Rate, and Pipe Stages of FPU Instructions 



Instruction Type 


Latency 


Repeat 
Rate 


Pipeline Stage 
Sequence 


JVHJV.Lb,UJ 


i 


i 


tA 


A r~*r~* rc r~ii 
J\LJU.[3,L)\ 


4 


O 


U— > o+A— > A+ri— > rl+o 


CT TR rc r~»i 


t 


Q 

o 


U— > o+A— > A+rl— > rl+o 


C.COND.[S,D] 


3 


2 


U^A^R 


NEG.[S,D] 


2 


1 


U^S 


ABS.[S,D] 


2 


1 


U^S 


CVT.S.W 


6 


5 


R 


CVT.D.W 


5 


4 


U-> A^ R^ S 


CVT.S.L 


7 


6 


A^ R^ A^ R 


CVT.D.L 


4 


3 


A^ R^ S 


CVT.D.S 


2 


1 


U^S 


CVT.S.D 


4 


3 


A^ R 


CVT.[W,L].[S,D] or 
ROUND.[W,L].[S,D] or 
TRUNC.[W,L].[S,D] or 
CEIL.[W,L].[S,D] or 
FLOOR.[W,L].[S,D] 


4 


3 


A^ R 


MUL.S 


7 


3 


E/M^ M-^ N/A^ R 


MUL.D 


8 


4 


E/M^ M-> N/A^ R 


DIV.S 


23 


22 


S+A^ S+R^ D...D^ D/A^ 
D/R^ D/A^ D/R^A^R 


DIV.D 


36 


35 


A^ R^ D...D^ D/A^ D/R^ D/A ^ 
D/R^ A^ R 


SQRT.S 


2-54 


2-53 


E^ A+R^...^ A+R^ A^ R 


SQRT.D 


2-112 


2-111 


E^ A+R^...^ A+R^ A^ R 
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Resource Scheduling Rules 

The FPU Resource Scheduler issues instructions while adhering to the 
rules described below. These scheduling rules optimize op unit 
executions; if the rules are not followed, the hardware interlocks to 
guarantee correct operation. 

DIV.[S,D] can start only when all of the following conditions are met in 
the RF stage: 

• The divider is either idle, or in its second-to-last execution cycle. 

• The adder is either idle, or in its second-to-last execution cycle. 

• The multiplier is either idle, or in its second-to-last execution 
cycle. 

Idle means an operation unit — adder, multiplier or divider — is either not 
processing any instruction, or is currently in its last execution cycle 
completing an instruction. 
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MUL.[S,D] can start only when all of the following conditions are met in 
the RF stage: 

• The multiplier is one of the following: 

- idle, or in its second-to-last execution cycle. 

- not within the first two execution cycles (EX, EX+1) if the 
most recent instruction in the multiplier pipe is MUL.S 

- not within the first three execution cycles (EX...EX+2) if 
the most recent instruction in the multiplier pipe is 
MUL.D 

• The adder is one of the following: 

- idle, or in its second-to-last execution cycle. 

- not processing the first execution cycle (EX) of CVT.S.L 

• The adder is not processing a square root instruction 

• The divider is one of the following: 

- idle, or in its second-to-last execution cycle. 

- in the first 8 execution cycles (EX...EX+7) of a DIV.S 

- in the first 21 execution cycles, except for the second 
execution cycle, (cycles EX, EX+2...EX+20) of a DIV.D) 
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SQRT.[S,D] can start only when all of the following conditions are met in 
the RFstage: 

• The divider is either idle, or in its second-to-last execution cycle. 

• The adder is either idle, or in its second-to-last execution cycle. 

• The multiplier is either idle, or in its second-to-last execution 
cycle. 

CVT.fmt, NEG.[S,D] or ABS.[S,D] instructions can only start when aU of 
the following conditions are met in the RF stage: 

• The adder is either idle, or in its second-to-last execution cycle. 

• The multiplier is either idle, or in its second-to-last execution 
cycle. 

• The divider is one of the following: 

- idle, or in its second-to-last execution cycle. 

- in the third through eighth execution cycle (EX+2...EX+7) 
of a DIV.S 

- in the third through twenty-first execution cycle 
(EX+2...EX+20) of a DIV.D 
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ADD.[S,D], SUB.[S,D] or C.COND.[S,D] can only start when all of the 

following conditions are met in the RF stage: 

• The adder is either idle, or in its second-to-last execution cycle. 

• The multiplier is one of the following: 

- idle, or in its second-to-last execution cycle. 

- not in the third or fourth execution cycles (EX+2...EX+3) 
if the most recent instruction in the multiplier pipe is 
MUL.S 

- not in the fourth or fifth execution cycles (EX+3...EX+4) if 
the most recent instruction in the multiplier pipe is 
MUL.D 

• The divider is one of the following: 

- idle, or in its second-to-last execution cycle. 

- in the third through eighth execution cycle (EX+2...EX+7) 
of a DIV.S 

- in the third through twenty-first execution cycle 
(EX+2...EX+20) of a DIV.D 



MIPS R4000 Microprocessor User's Manual 



185 



Chapter 6 



186 MIPS R4000 Microprocessor User 's Manual 



Floating-Point Exceptions 



7 



This chapter describes FPU floating-point exceptions, including FPU 
exception t5^es, exception trap processing, exception flags, saving and 
restoring state when handling an exception, and trap handlers for IEEE 
Standard 754 exceptions. 

A floating-point exception occurs whenever the FPU cannot handle either 
the operands or the results of a floating-point operation in its normal way. 
The FPU responds by generating an exception to initiate a software trap or 
by setting a status flag. 
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7.1 Exception Types 

The FP Control/Status register described in Chapter 6 contains an Enablehit 
for each exception t5^e; exception Enable bits determine whether an 
exception will cause the FPU to initiate a trap or set a status flag. 

• If a trap is taken, the FPU remains in the state found at the 
beginning of the operation and a software exception handling 
routine executes. 

• If no trap is taken, an appropriate value is written into the FPU 
destination register and execution continues. 

The FPU supports the five IEEE Standard 754 exceptions: 

• Inexact (I) 

• Underflow (U) 

• Overflow (O) 

• Division by Zero (Z) 

• Invalid Operation (V) 

Cause bits. Enables, and Flag bits (status flags) are used. 

The FPU adds a sixth exception type, Unimplemented Operation (E), to 
use when the FPU cannot implement the standard MIPS floating-point 
architecture, including cases in which the FPU cannot determine the 
correct exception behavior. This exception indicates the use of a software 
implementation. The Unimplemented Operation exception has no Enable 
or Flag bit; whenever this exception occurs, an unimplemented exception 
trap is taken (if the FPU interrupt input to the CPU is enabled). 

Figure 7-1 illustrates the Control/Status register bits that support 
exceptions. 
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Invalid Operation 
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Figure 7-1 Control/Status Register Exception/Flag/Trap/Enable Bits 

Each of the five IEEE Standard 754 exceptions (V, Z, O, U, I) is associated 
with a trap under user control, and is enabled by setting one of the five 
Enable bits. When an exception occurs, the corresponding Cause bit is set. 
If the corresponding Enable bit is not set, the Flag bit is also set. If the 
corresponding Enable bit is set, the Flag bit is not set and the FPU generates 
an interrupt to the CPU. Subsequent exception processing allows a trap to 
be taken. 



7.2 Exception Trap Processing 

When a floating-point exception trap is taken, the Cause register indicates 
the floating-point coprocessor is the cause of the exception trap. The 
Floating-Point Exception (FPE) code is used, and the Cause bits of the 
floating-point Control/Status register indicate the reason for the floating- 
point exception. These bits are, in effect, an extension of the system 
coprocessor Cause register. 
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7.3 Flags 

A Flag bit is provided for each IEEE exception. This Flag bit is set to a 1 on 
the assertion of its corresponding exception, with no corresponding 
exception trap signaled. 

The Flag bit is reset by writing a new value into the Status register; flags 
can be saved and restored by software either individually or as a group. 

When no exception trap is signaled, floating-point coprocessor takes a 
default action, providing a substitute value for the exception-causing 
result of the floating-point operation. The particular default action taken 
depends upon the type of exception. Table 7-1 lists the default action taken 
by the FPU for each of the IEEE exceptions. 



Table 7-1 Default FPU Exception Actions 



Field 


Description 


Rounding 
Mode 


Default action 


I 


Inexact 
exception 


Any 


Supply a rounded result 






RN 


Modify underflow values to 0 with the sign of the 
intermediate result 


U 


Underflow 


RZ 


Modify underflow values to 0 with the sign of the 

intermediate result 


exception 


RP 


Modify positive underflows to the format's smallest positive 
finite nimiber; modify negative imderflows to -0 






RM 


Modify negative underflows to the format's smallest 
negative finite number; modify positive underflows to 0 






RN 


Modify overflow values to °° with the sign of the 
intermediate result 


o 


Overflow 


RZ 


Modify overflow values to the format's largest finite number 
with the sign of the intermediate result 


exception 


RP 


Modify negative overflows to the format's most negative 
finite number; modify positive overflows to + oo 






RM 


Modify positive overflows to the format's largest finite 
number; modify negative overflows to - oo 


z 


Division by 

zero 


Any 


Supply a properly signed oo 


V 


Invalid 
operation 


Any 


Supph' a quiet Not a Number (NaN) 
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The FPU detects the eight exception causes internally. When the FPU 
encounters one of these unusual situations, it causes either an IEEE 
exception or an Unimplemented Operation exception (E). 

Table 7-2 lists the exception-causing situations and contrasts the behavior 
of the FPU with the reqiiirements of the IEEE Standard 754. 



Table 7-2 FPU Exception-Causing Conditions 



FPA Internal 
Result 


IEEE 
Standard 
754 


Trap 
Enable 


Trap 
Disable 


Notes 


Inexact result 


I 


I 


I 


Loss of accuracy 


Exponent overflow 


o,f 


o,i 


o,i 


Normalized exponent > E^^^ 


Division by zero 


z 


z 


z 


Zero is (exponent = En^j^-l, 
mantissa = 0) 


Overflow on convert 


V 


E 


E 


Source out of integer range 


Signaling NaN 
source 


V 


V 


V 




Invalid operation 


V 


V 


V 


0/0, etc. 


Exponent underflow 


u 


E 


uit 


Normalized exponent < E^jn 


Denormalized or 
QNaN 


None 


E 


E 


Denormalized is (exponent = 
^min"! ^^'^ mantissa <> 0) 



+ The IEEE Standard 754 specifies an inexact exception on overflow only if the overflow 
trap is disabled. 

if Exponent underflow sets the U and I Cause bits if both the U and I Enable bits are not set 
and the FS bit is set; otherwise exponent underflow sets the E Cause bit. 
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7 A FPU Exceptions 

The following sections describe the conditions that cause the FPU to 
generate each of its exceptions, and details the FPU response to each 
exception-causing condition. 

Inexact Exception (I) 

The FPU generates the Inexact exception if one of the following occurs: 

• the rounded result of an operation is not exact, or 

• the rovinded result of an operation overflows, or 

• the rounded result of an operation underflows and both the 
Underflow and Inexact Enable bits are not set and the FS bit is 
set. 

The FPU usually examines the operands of floating-point operations 
before execution actually begins, to determine (based on the exponent 
values of the operands) if the operation can possibly cause an exception. If 
there is a possibility of an instruction causing an exception trap, the FPU 
uses a coprocessor stall to execute the instruction. 

It is impossible, however, for the FPU to predetermine if an instruction will 
produce an inexact result. If Inexact exception traps are enabled, the FPU 
uses the coprocessor stall mechanism to execute all floating-point 
operations that require more than one cycle. Since this mode of execution 
can impact performance. Inexact exception traps should be enabled only 
when necessary. 

Trap Enabled Results: If Inexact exception traps are enabled, the result 
register is not modified and the source registers are preserved. 

Trap Disabled Results: The rounded or overflowed resiilt is delivered to 
the destination register if no other software trap occurs. 
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Invalid Operation Exception (V) 

The Invalid Operation exception is signaled if one or both of the operands 
are invalid for an implemented operation. When the exception occurs 
without a trap, the MIPS ISA defines the result as a quiet Not a Number 
(NaN). The invalid operations are: 

• Addition or subtraction: magnitude subtraction of infinities, 

such as: ( + 00 ) + (- oo)or(-oo )_(_oo) 

• Multiplication: 0 times 00, with any signs 

• Division: 0/0, or 00/00, with any signs 

• Comparison of predicates involving < or > without ?, when the 
operands are unordered 

• Comparison or a Convert From Floating-point Operation on a 
signaling NaN. 

• Any arithmetic operation on a signaling NaN. A move (MOV) 
operation is not considered to be an arithmetic operation, but 
absolute value (ABS) and negate (NEG) are considered to be 
arithmetic operations and cause this exception if one or both 
operands is a signaling NaN. 

• Square root: Vx, where x is less than zero 

Software can simulate the Invalid Operation exception for other 
operations that are invalid for the given source operands. Examples of 
these operations include IEEE Standard 754-specified functions 
implemented in software, such as Remainder: x REM y, where 1/ is 0 or x is 
infinite; conversion of a floating-point nimiber to a decimal format whose 
value causes an overflow, is infinity, or is NaN; and transcendental 
functions, such as In (-5) or cos-l{3). Refer to Appendix B for examples or 
for routines to handle these cases. 

Trap Enabled Results: The original operand values are undisturbed. 

Trap Disabled Results: A quiet NaN is delivered to the destination 
register if no other software trap occurs. 
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Division-by-Zero Exception (Z) 

The Division-by-Zero exception is signaled on an implemented divide 
operation if the divisor is zero and the dividend is a finite nonzero niimber. 
Software can simulate this exception for other operations that produce a 
signed infinity, such as ln(0), sec(7t/2), csc(O), or 0~^- 

Trap Enabled Results: The result register is not modified, and the source 
registers are preserved. 

Trap Disabled Results: The result, when no trap occurs, is a correctly 
signed infinity. 

Overflow Exception (O) 

The Overflow exception is signaled when the magnitude of the rounded 
floating-point result, with an imbovinded exponent range, is larger than 
the largest finite number of the destination format. (This exception also 
sets the Inexact exception and Flag bits.) 

Trap Enabled Results: The result register is not modified, and the source 

registers are preserved. 

Trap Disabled Results: The result, when no trap occurs, is determined by 
the roimding mode and the sign of the intermediate result (as listed in 
Table 7-1). 
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Underflow Exception (U) 

Two related events contribute to the Underflow exception: 

• creation of a tiny nonzero result between +2^°^^'^ which can 
cause some later exception because it is so tiny 

• extraordinary loss of accuracy during the approximation of 
such tiny numbers by denormalized numbers. 

IEEE Standard 754 allows a variety of ways to detect these events, but 
requires they be detected the same way for all operations. 

Tininess can be detected by one of the following methods: 

• after rounding (when a nonzero result, computed as though 
the exponent range were imbounded, would lie strictly 
between ±2^"^) 

• before rovmding (when a nonzero result, computed as though 
the exponent range and the precision were unbounded, would 
lie strictly between +2^™). 

The MIPS architecture requires that tininess be detected after rounding. 

Loss of accuracy can be detected by one of the following methods: 

• denormalization loss (when the delivered result differs 
from what would have been computed if the exponent 
range were unbounded) 

• inexact result (when the delivered result differs from what 
would have been computed if the exponent range and 
precision were both imbounded). 

The MIPS architecture requires that loss of accuracy be detected as an 
inexact result. 

Trap Enabled Results: If Underflow or Inexact traps are enabled, or if the 
f S bit is not set, then an Unimplemented exception (E) is generated, and 
the result register is not modified. 

Trap Disabled Results: If Underflow and Inexact traps are not enabled 
and the FS bit is set, the result is determined by the rounding mode and 
the sign of the intermediate result (as listed in Table 7-1). 
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Unimplemented Instruction Exception (E) 

Any attempt to execute an instruction with an operation code or format 
code that has been reserved for future definition sets the Unimplemented bit 
in the Cause field in the FPU Control/Status register and traps. The operand 
and destination registers remain undisturbed and the instruction is 

emulated in software. Any of the IEEE Standard 754 exceptions can arise 
from the emulated operation, and these exceptions in turn are simulated. 

The Unimplemented Instruction exception can also be signaled when 
unusual operands or result conditions are detected that the implemented 
hardware cannot handle properly. These include: 

• Denormalized operand, except for Compare instruction 

• Quiet Not a Number operand, except for Compare instruction 

• Denormalized result or Underflow, when either Underflow or 
Inexact Enable bits are set or the FS bit is not set. 

• Reserved opcodes 

• Unimplemented formats 

• Operations which are invalid for their format (for instance, 
CVT.S.S) 

NOTE: Denormalized and NaN operands are only trapped if the 
instruction is a convert or computational operation. Moves do not trap 
if their operands are either denormalized or NaNs. 

The use of this exception for such conditions is optional; most of these 
conditions are newly developed and are not expected to be widely used in 
early implementations. Loopholes are provided in the architecture so that 
these conditions can be implemented with assistance provided by 
software, maintaining full compatibility with the IEEE Standard 754. 

Trap Enabled Results: The original operand values are imdisturbed. 

Trap Disabled Results: This trap cannot be disabled. 
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7.5 Saving and Restoring State 

Sixteen doubleword coprocessor load or store operations save or restore 
the coprocessor floating-point register state in memory. The remainder of 
control and status information can be saved or restored through Move To/ 
From Coprocessor Control Register instructions, and saving and restoring 
the processor registers. Normally, the Control/ Status register is saved first 
and restored last. 

When the coprocessor Control/Status register {FCR31) is read, and the 
coprocessor is executing one or more floating-point instructions, the 
instruction(s) in progress are either completed or reported as exceptions. 
The architecture requires that no more than one of these pending 
instructions can cause an exception. If the pending instruction cannot be 
completed, this instruction is placed in the Exception register, if present. 
Information indicating the type of exception is placed in the Control/Status 
register. When state is restored, state information in the status word 
indicates that exceptions are pending. 

Writing a zero value to the Cause field of Control/Status register clears all 
pending exceptions, permitting normal processing to restart after the 
floating-point register state is restored. 

The Cause field of the Control/Status register holds the results of only one 
instruction; the FPU examines source operands before an operation is 
initiated to determine if this instruction can possibly cause an exception. If 
an exception is possible, the FPU executes the instruction in stall mode to 
ensure that no more than one instruction (that might cause an exception) 
is executed at a time. 
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7.6 Trap Handlers for IEEE Standard 754 Exceptions 

The IEEE Standard 754 strongly recommends that users be allowed to 
specify a trap handler for any of the five standard exceptions that can 
compute; the trap handler can either compute or specify a substitute result 
to be placed in the destination register of the operation. 

By retrieving an instruction using the processor Exception Program Counter 
(EPC) register, the trap handler determines: 

• exceptions occurring during the operation 

• the operation being performed 

• the destination format 

On Overflow or Underflow exceptions (except for conversions), and on 
Inexact exceptions, the trap handler gains access to the correctly roimded 
result by examining source registers and simulating the operation in 
software. 

On Overflow or Underflow exceptions encountered on floating-point 
conversions, and on Invalid Operation and Divide-by-Zero exceptions, the 
trap handler gains access to the operand values by examining the source 
registers of the instruction. 

The IEEE Standard 754 recommends that, if enabled, the overflow and 
underflow traps take precedence over a separate inexact trap. This 
prioritization is accomplished in software; hardware sets the bits for both 
the Inexact exception and the Overflow or Underflow exception. 
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This chapter describes the signals used by and in conjunction with the 
R4000 processor. The signals include the System interface, the Clock/ 
Control interface, the Secondary Cache interface, the Interrupt interface, 
the Joint Test Action Group (JTAG) interface, and the Initialization 

interface. 

Signals are listed in bold, and low active signals have a trailing asterisk — 
for instance, the low-active Read Ready signal is RdRdy*. The signal 
description also tells if the signal is an input (the processor receives it) or 
output (the processor sends it out). 

Figure 8-1 illustrates the functional groupings of the processor signals. 
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SysAD(63:0) 

SysADC(7:0) 
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SCData (127:0) 
SCDChk(15:0) 
SCTag (24:0) 
SCTChk (6:0) 
SCAddr(17:1) 
SCAddrO (w,x,y,z) 
SCAPar(2:0) 
SCOE* 

SCWr(w,x,y,z)* 
SCDCS* 

SCTCS* 

lnt(5:1)*(2) 

Into* 

NMI* 



ModeClock 

ModeIN 

VCCOk 

ColdReset* 

Reset* 

JTDI 
JTDO 
JTMS 
JTCK 



(1) = R4000SC and R4000MC only (2) = R4000PC only 
(3) = R4000MC only (4) = R4400 only 



Figure 8-1 R4000 Processor Signals 
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8.1 System Interface Signals 

System interface signals provide the connection between the R4000 
processor and the other components in the system. IvdAck* and IvdErr* 
signals are applicable only on R4000MC; on the R4000SC they must be tied 
to Vcc. The remaining signals are available on all three of the package 
configurations. 

Table 8-1 lists the system interface signals. 



Table 8-1 System Interface Signals 



Name 


Definition 


Direction 


Description 


ExtRqst* 


External request 


Input 


An external agent asserts ExtRqst* to 
request use of the System interface. The 
processor grants the request by asserting 
Release*. 


IvdAck* 


Invalidate 
acknowledge 


Input 


An external agent asserts IvdAck* to 
signal successful completion of a 
processor invalidate or update request 
(R4000MC only; tie to Vcc on R4000SC). 


IvdErr* 


Invalidate error 


Input 


An external agent asserts IvdErr* to 
signal unsuccessful completion of a 
processor invalidate or update request 
(R4000MC only; tie to Vcc on R4000SC). 


Release* 


Release interface 


Output 


In response to the assertion of ExtRqst*, 
the processor asserts Release*, signalling 
to the requesting device that the System 
interface is available. 


RdRdy* 


Read ready 


Input 


The external agent asserts RdRdy* to 
indicate that it can accept processor read, 
invalidate, or update requests in both 
secondary-cache and no-secondary-cache 
mode; or can accept a read followed by 
write request, a read followed by a 
potential update request, or a read 
followed by a potential update followed 
by a write request in secondary cache 
mode. 


SysAD(63:0) 


System address/ 
data bus 


Input/ 
Output 


A 64-bit address and data bus for 
communication between the processor 
and an external agent. 
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Table 8-1 (cont.) System Interface Signals 



Name 


Definition 


Direction 


Description 


SysADC(7:0) 


System address/ 
data check bus 


Input/Output 


An 8-bit bus containing 
check bits for the SysAD 
bus.+ 


SysCmd(8:0) 


System command/ 
data identifier 


Input /Output 


A 9-bit bus for command 
and data identifier 
transmission between the 
processor and an external 
agent. 


SysCmdP 


System command/ 
data identifier bus 
parity 


Input/ Output 


A single, even-parity bit for 
the SysCmd bus. When the 
System interface is set to 
parity mode, the processor 
also indicates a secondary 
cache ECC error by 
corrupting the state of the 
SysCmdP signal. 


Validin* 


Valid input 


Input 


The external agent asserts 
Validin* when it is driving a 
valid address or data on the 
SysAD bus and a valid 
command or data identifier 
on the SysCmd bus. 


ValidOut* 


Valid output 


Output 


The processor asserts 
ValidOut* when it is 
driving a valid address or 
data on the SysAD bus and a 
valid command or data 
identifier on the SysCmd 
bus. 


WrRdy* 


Write ready 


Input 


An external agent asserts 
WrRdy* when it can accept 
a processor write request. 



t. The SysADC(7:0) bits map to the SysAD bus in this manner: SysADC(7) covers 
SysAD(63:56), SysADC(6) covers SysAD(55:48), and so on down to SysADC(O), which 
covers SysAD(7:0). 
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8.2 Clock/Control Interface Signals 

The Clock/ Control interface signals make up the interface for clocking 
and maintenance. Table 8-2 lists the Clock/ Control interface signals. 



Table 8-2 Clock/Control Interface Signals 



Name 


Definition 


Direction 


Description 


lOOut 


I/O output 


Output 


Output slew rate control 
feedback loop output. Must be 
connected to lOIn through a 
delay loop that models the I/O 
path from the processor to an 
external agent. 


lOIn 


I/O input 


Input 


Output slew rate control 
feedback loop input (see 
lOOut). 


MasterClock 


Master clock 


Input 


Master clock input that 
establishes the processor 
operating frequency. 


MasterOut 


Master clock out 


Output 


Master clock output aligned 
with MasterClock. 


RClock(l:0) 


Receive clocks 


Output 


Two identical receive clocks that 
establish the System interface 

frequency. 


SyncOut 


Synchronization 
clock out 


Output 


Svnrhronization clock outnut 
Must be connected to Syncin 
through an interconnect that 
models the interconnect 
between MasterOut, TClock, 
RClock, and the external agent. 


Syncin 


Synchronization 
clock in 


Input 


Synchronization clock input. 


TClock(l:0) 


Transmit clocks 


Output 


Two identical transmit clocks 
that establish the System 
interface frequency. 


Fault* 


Fault 


Output 


The processor asserts Fault* to 
indicate a mismatch output of 

boundary comparators, and 
indication of System interface 
input parity or ECC errors. 
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Table 8-2 (cont.) Clock/Control Interface Signals 



Name 


Definition 


Direction 


Description 


Status(7:0) 


Status 


Output 


An 8-bit bus that indicates the 
current operational status of the 
processor. R4400 only. 


VccP 


Quiet Vcc for PLL 


Input 


Quiet Vcc for the internal phase 
locked loop. 


VccSense 


Vcc sense 


Input/ 
Output 


A special pin used only in 
component testing and 
characterization,VccSense 
provides a separate, direct 
connection from the on-chip Vcc 
node to a package pin, without 
connecting to the in-package 
power planes. Test fixtures treat 
VccSense as an analog output 
pin; the voltage at this pin 
directly exhibits the behavior of 
the on-chip Vcc. Thus, 
characterization engineers can 
easily observe the effects of Ai/ 
At noise, transmission line 
reflections, etc. VccSense 
should be connected to Vcc in 
functional system designs. 


VssP 


Quiet Vss for PLL 


Input 


Quiet Vss for the internal phase 
locked loop. 


VssSense 


Vss sense 


Input/ 
Output 


VssSense provides a separate, 
direct connection from the on- 
chip Vss node to a package pin 
without having to connect to the 
in-package ground planes. 
VssSense should be connected 
to Vss in functional system 
designs. 
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8.3 Secondary Cache Interface Signals 

Secondary Cache interface signals constitute the interface between the 
R4000 processor and secondary cache. These signals are available only on 
the R4000MC and R4000SC. Table 8-3 lists the Secondary Cache interface 
signals. 



Table 8-3 Secondary Cache Interface Signals 



Name 


Definition 


Direction 


Description 


SCAddr(17:l) 


secondary cache 
address bus 


Output 


The 18-bit address bus for the 
secondary cache. Bit 0 has four 
output lines, (SCAddrOWrZ), to 
provide additional drive current. 


SCAddrOW 


Secondary cache 
address LSB 


Output 


SCAddrOX 


Secondary cache 
address LSB 


Output 


SCAddrOY 


Secondary cache 
address LSB 


Output 


SCAddrOZ 


Secondary cache 
address LSB 


OuttDUt 


SCAPar(2:0) 


Secondary cache 
address parity 
bus 


Output 


A 3-bit bus that carries the parity 
of the SCAddr bus and the cache 
control line SCWr*. The 
individual bit definitions are: 

Even parity for SCAddr(17:12) 
and SCWr* 

Even parity for SCAddr(ll:6) and 
SCDCS* 

Even parity for SCAddr(5:0) and 
SCTCS* 


SCAParZ 


Secondary cache 
address parity 
bus 


Output 


SCAParl 


Secondary cache 
address parity 
bus 


Output 


SCAParO 


Secondary cache 
address parity 
bus 


Output 


SCData(127:0) 


Secondary cache 

data bus 


Input/ Output 


A 128-bit bus used to read or write 
cache data from and to the 
secondary cache data RAM. 
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Table 8-3 (cont.) Secondary Cache Interface Signals 



Name 


Definition 


Direction 


Description 




Secondary cache 
data ECC bus 


Input/ Output 


A 16-bit bus that carries two 8-bit 
ECC fields that cover the 128 bits 
of SCData from /to secondary 
cacne. a(_L»CnK(15:8J 
corresponds to SCData(127:64) 
and SCDChk(7:0) corresponds to 
SCData(63:0). 


SCDCS* 


Secondary cache 
data chip select 


Output 


Chip select enable signal for the 
secondary cache data RAM. 


SCOE* 


Secondary cache 
output enable 


Output 


Output enable for the secondary 
cache data and tag RAM. 


SCTag(24:0) 


Secondary cache 

tag bus 


Input/ Output 


A 25-bit bus used to read or write 
cache tags from and to the 
secondary cache. 


SCTChk(6:0) 


Secondary cache 
tag ECC bus 


Input/ Output 


A 7-bit bus that carries an ECC 
field covering the SCTag from and 
to the secondary cache. 


SCTCS* 


Secondary cache 
tag chip select 


Output 


Chip select enable signal for the 
secondary cache tag RAM. 


SCWrW* 


Secondary cache 
write enable 


Output 


Write enable for the secondary 
cache data and tag RAM. 


SCWrX* 


Secondary cache 
write enable 


Output 


Write enable for the secondary 
cache data and tag RAM. 


SCWrY* 


Secondary cache 
write enable 


Output 


Write enable for the secondary 
cache data and tag RAM. 


SCWrZ* 


Secondary cache 
write enable 


Output 


Write enable for the secondary 
cache data and tag RAM. 
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8.4 Interrupt Interface Signals 

The Interrupt interface signals make up the interface used by external 
agents to interrupt the R4000 processor. In t* (5:1) are available only on the 
R4000PC; lnt*(0) and NMI* are available on all three configurations. Table 
8-4 lists the Interrupt interface signals. 



Table 8-4 Interrupt Interface Signals 



Name 


Definition 


Direction 


Description 


Int*(5:l) 


Interrupt 


Input 


Five of six general processor interrupts, bit- 
wise ORed with bits 5:1 of the interrupt 
register. R4000PC only. 


lnt*(0) 


Interrupt 


Input 


One of six general processor interrupts, bit- 
wise ORed with bit 0 of the interrupt register. 


NMI* 


Nonmaskable 
interrupt 


Input 


Nonmaskable interrupt, ORed with bit 6 of the 
interrupt register. 



8.5 JTAG Interface Signals 

The JTAG interface signals make up the interface that provides the JTAG 
boimdary scan mechanism. Table 8-5 lists the JTAG interface signals. 

Table 8-5 JTAG Interface Signals 



Name 


Definition 


Direction 


Description 


JTDI 


JTAG data in 


Input 


Data is serially scanned in through this pin. 


JTCK 


TAG clock input 


Input 


The processor outputs a serial clock on 
JTCK. On the rising edge of JTCK, both 
JTDI and JTMS are sampled. 


JTDO 


JTAG data out 


Output 


Data is serially scanned out through this pin. 


JTMS 


JTAG command 


Input 


JTAG command signal, indicating the 
incoming serial data is command data. 
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8.6 Initialization Interface Signals 

The Initialization interface signals make up the interface by which an 
external agent initializes the processor operating parameters. These 
signals are available on each of the three processor configurations. Table 
8-6 lists the Initialization interface signals. 



Table 8-6 Initialization Interface Signals 



Name 


Definition 


Direction 


Description 


ColdReset* 


Cold reset 


Input 


This signal must be asserted for a 
power on reset or a cold reset. Ine 
clocks SClock, TClock, and RClock 
begin to cycle and are synchronized 
with the deasserted edge of 
ColdReset*. ColdReset* must be 
deasserted S5mchronously with 
MasterOut. 


ModeClock 


Boot mode clock 


Output 


Serial boot-mode data clock output; 
nms at the system clock frequency 
divided by 256: (MasterClock/256). 


Modein 


Boot mode data in 


Input 


Serial boot-mode data input. 


Reset* 


Reset 


Input 


This signal must be asserted for any 
reset sequence. It can be asserted 
S5mchronously or asynchronously for 
a cold reset, or s}mchronously to 
initiate a warm^ reset. Reset* must be 
deasserted S5mchronously with 
MasterOut. 


VCCOk 


Vcc is OK 


Input 


When asserted, this signal indicates to 
the processor that the +5 volt power 
supply has been above 4.75 volts for 
more than 100 milliseconds and will 
remain stable. The assertion of 
VCCOk initiates the initialization 
sequence. 



I t. A warm reset restarts processor, but does not affect clocks; it preserves the processor in- 

ternal state. A description of warm reset is given in Chapter 9. 
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8.7 Signal Summary 

Table 8-7 R4000SC/MC Processor Signal Summary 



Description 


Name 


I/O 


Asserted 
State 


3-State 


Sprondi^TV r^irVip di^ti^ bus 


SCData(127:0) 


I/O 

X/ 


Hieh 


Yes 


SprnTid;^rv r;^rViP d;^t;^ FC^(^ Him 


SCDChk(15-0~) 


I/O 


Hieh 


Yes 


Sprond^irv rPirVip tPip" him 

^7 V- V. V_' 1 l^^CJ.± y V, CI V, 1 LClc^ \-J LI J 


SCTaff(24-0") 


I/O 


Hieh 


Yes 




SCTChW6-0) 


I/O 

X/ 


Hieh 


Yes 


Sprondarv rarlip addrpss Him 


SCAddrCl/l) 


o 


Hieh 


No 


fiprond^irv r^irHp addrpss T.fiR 


SCAddrOZ 


o 


Hieh 


No 


fiprondarv rarHp addrpss T.fiB 


SCAddrOY 


o 


Hieh 


No 


fiprondarv rarHp addrpss T,fiR 


SCAddrOX 


o 


Hieh 


No 


fiprondarv rarHp addrpss T.fiR 


SCAddrOW 


o 


Hieh 


No 


Sprondi^rv ri^rVip i^ddrpss n^iritv Hus 


SCAParQ-OI 

<jy^± XX CXX 1 ^a\_/ 1 


o 


Hieh 


No 


SprnTid;^rv r;^rViP nntnnt pti;^V)1p 

V- V. V_' 1 H^CI-X y V. Li v. 1 \_J LI L Li L ^X LCXX^X^ 


SCOE* 


o 


Low 


No 


Sprond^irv rPirVip writp pti^iHIp 

^7 V- V. V_' 1 l^^CJ-X y V. Li V. 1 VV X X v^X LCIJL^XV^ 


SCWrZ* 


o 


Low 


No 


Sprond^irv r^irhp writp pn^iblp 


SCWrY* 


o 


Low 


No 


Sprondarv rarHp writp pnaHlp 

h_/&\-WX LV^CXX y V^d V_X vvxxl^ &x lcxl^x^ 


SCWrX* 


o 


Low 


No 


fiprond^irv r^irHp writp pnaHlp 

k^^V-WX LVXCLX y V^CI V_X L^_- VVXXL^ ^XLCXL^X^ 


SCWrW* 


o 


Low 


No 


fiprondarv rarlip data rHin splprt 


SCDCS* 

i^_y V - 1 — ' V - 1 ' 


o 


Low 


No 




SCTCS* 


o 


Low 


No 


fivstpm addrpss /data Him 


SvsADfBS-Ot 


I/O 

X/ 


Hieh 


Yes 


System address /data check bus 


SysADC(7:0) 


I/O 


Hieh 

O 


Yes 


System command/data identifier bus 


SysCmd(8:0) 


I/O 


High 


Yes 


System command/data identifier bus parity 


SysCmdP 


I/O 


High 


Yes 


Valid input 


Validin* 


I 


Low 


No 


Valid output 


ValidOut* 


o 


Low 


No 


External request 


ExtRqst* 


I 


Low 


No 


Release interface 


Release* 


o 


Low 


No 


Read ready 


RdRdy* 


I 


Low 


No 


Write ready 


WrRdy* 


I 


Low 


No 


Invalidate acknowledge 


IvdAck* 


I 


Low 


No 


Invalidate error 


IvdErr* 


I 


Low 


No 
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Table 8-7 (cont.) R4000SC/MC Processor Signal Summary 







T/n 


Asserted 
State 


J J Idle 


Interrupt 


lnt*(0) 


I 


Low 


No 


Nonmaskable interrupt 


NMI* 


I 


Low 


No 


Boot mode data in 


Modein 


I 


High 


No 


Boot mode clock 


ModeClock 


o 


High 


No 


JTAG data in 


JTDI 


I 


High 


No 


JTAG data out 


JTDO 


o 


High 


No 


JTAG command 


JTMS 


I 


High 


No 


JTAG clock input 


JTCK 


I 


High 


No 


Transmit clocks 


TClock(l:0) 


o 


High 


No 


Receive clocks 


RClock(l:0) 


o 


High 


No 


Master clock 


MasterClock 


I 


High 


No 


Master clock out 


MasterOut 


o 


High 


No 


Synchronization clock out 


SyncOut 


o 


High 


No 


S5m.chronization clock in 


Syncin 


I 


High 


No 


I/O output 


lOOut 


o 


High 


No 


I/O input 


lOIn 


I 


High 


No 


Vcc is OK 


VCCOk 


I 


High 


No 


Cold reset 


ColdReset* 


I 


Low 


No 


Reset 


Reset* 


I 


Low 


No 


Fault 


Fault* 


o 


Low 


No 


Quiet Vcc for PLL 


VccP 


I 


High 


No 


Quiet Vss for PLL 


VssP 


1 


High 


No 


Status 


Status(7:0) 


o 


High 


No 


Vcc sense 


VccSense 


I/O 


N/A 


No 


Vss sense 


VssSense 


I/O 


N/A 


No 
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Table 8-8 R4000PC Processor Signal Summary 



n ^ c 1*1 ti rfc M 




T/n 


Asserted 




1^ diiic 


State 


J J Idle 


System address /data bus 


SysAD(63:0) 


I/O 


High 


Yes 


System address/ data check bus 


SysADC(7:0) 


I/O 


High 


Yes 


System command/ data identifier bus 


SysCmd(8:0) 


I/O 


High 


Yes 


System command/ data identifier bus parity 


SysCmdF 


I/O 


High 


Yes 


Valid input 


VaUdIn* 


I 


Low 


No 


Valid output 


ValidOut* 


o 


Low 


No 


External request 


ExtRqst* 


I 


Low 


No 


Release interface 


Release* 


o 


Low 


No 


Read ready 


RdRdy* 


I 


Low 


No 


Write ready 


WrRdy* 


I 


Low 


No 


Interrupts 


Int*(5:l) 


I 


Low 


No 


Interrupt 


lnt*(0) 


I 


Low 


No 


Nonmaskable interrupt 


NMl* 


1 


Low 


No 


Boot mode data in 


Modein 


I 


High 


No 


Boot mode clock 


ModeClock 


o 


High 


No 


JTAG data in 


JTDI 


I 


High 


No 


JTAG data out 


JTDO 


o 


High 


No 


JTAG command 


JTMS 


I 


High 


No 


JTAG clock input 


JTCK 


I 


High 


No 


Transmit clocks 


TClock(l:0) 


o 


High 


No 


Receive clocks 


RClock(l:0) 


o 


High 


No 


Master clock 


Master-Clock 


1 


High 


No 


Master clock out 


MasterOut 


o 


High 


No 


S5mchronization clock out 


S5mcOut 


o 


High 


No 


Synchronization clock in 


Syncln 


I 


High 


No 


I/O output 


lOOut 


o 


High 


No 


I/O input 


lOIn 


I 


High 


No 


Vcc is OK 


VCCOk 


I 


High 


No 
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Table 8-8 (cont.) R4000PC Processor Signal Summary 



Description 


Name 


I/O 


Asserted 
State 


3-State 


Cold reset 


ColdReset* 


1 


Low 


No 


Reset 


Reset* 


I 


Low 


No 


Fault 


Fault* 


o 


Low 


No 


Quiet Vcc for FLL 


VccF 


1 


High 


No 


Quiet Vss for PLL 


VssP 


I 


High 


No 
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This chapter describes the R4000 Initialization interface. This includes the 
reset signal description and t5^es, initialization sequence, with signals 
and timing dependencies, and boot modes, which are set at initialization 
time. 

Signal names are listed in bold letters — for instance the signal VCCOk 
indicates +5 voltage is stable. Low-active signals are indicated by a 
trailing asterisk, such as ColdReset*, the power-on/cold reset signal. 
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9.1 Functional Overview 

The R4000 processor has the following three t5^es of resets; they use the 
VCCOk, ColdReset*, and Reset* input signals. 

• Power-on reset: starts when the power supply is turned on and 
completely reinitializes the internal state machine of the 
processor without saving any state information. 

• Cold reset: restarts all clocks, but the power supply remains 
stable. A cold reset completely reinitializes the internal state 
machine of the processor without saving any state information. 

• Warm reset: restarts processor, but does not affect clocks. A 
warm reset preserves the processor internal state. 

The operation of each type of reset is described in sections that follow. 
Refer to Figures 9-1, 9-2, and 9-3 later in this chapter for timing diagrams 
of the power-on, cold, and warm resets. 

The Initialization interface is a serial interface that operates at the 
frequency of the MasterClock divided by 256: (MasterClock/256). This 
low-frequency operation allows the initialization information to be stored 
in a low-cost EPROM. 
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9.2 Reset Signal Description 

This section describes the three reset signals, VCCOk, ColdReset*, and 
Reset*. 

VCCOk: When asserted''', VCCOk indicates to the processor that the +5 
volt power supply (Vcc) has been above 4.75 volts for more than 100 
milliseconds (ms) and is expected to remain stable. The assertion of 
VCCOk initiates the reading of the boot-time mode control serial stieam 
(described in Initialization Sequence, in this chapter). 

ColdReset*: The ColdReset* signal must be asserted (low) for either a 
power-on reset or a cold reset. The clocks SClock, TClock, and RClock 
begin to cycle and are s5mchronized with the deasserted edge (high) of 
ColdReset*. ColdReset* must be deasserted synchronously with 
MasterClock. 

Reset*: the Reset* signal must be asserted for any reset sequence. It can 
be asserted S5mchronously or as5mchronously for a cold reset, or 
synchronously to initiate a warm reset. Reset* must be deasserted 
synchronously with MasterClock. 

Modein: Serial boot mode data in. 

ModeClock: Serial boot mode data out, at the MasterClock frequency 
divided by 256 (MasterClock/256). 



t Asserted means the signal is true, or in its valid state. For example, the low-active Reset* 
signal is said to be asserted when it is in a low (true) state; the high-active VCCOk signal 
is true when it is asserted high. 
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Power-on Reset 

The sequence for a power-on reset is listed below. 

1. Power-on reset applies a stable Vcc of at least 4.75 volts from the 
+5 volt power supply to the processor. It also supplies a stable, 
continuous system clock at the processor operational frequency. 

2. After at least 100 ms of stable Vcc and MasterClock, the VCCOk 

signal is asserted to the processor. The assertion of VCCOk 
initializes the processor operating parameters. After the mode 
bits have been read in, the processor allows its internal phase 
locked loops to lock, stabilizing the processor internal clock, 
PClock, the SyncOut-SyncIn clock path (described in Chapter 
10), and the master clock output, MasterOut. Note that when 
JTAG is not used, JTCK must be tied low at the rising edge of 
VCCOk for the processor to properly reset. If JTAG is used, JTCK 
may be toggled during power-up. 

3. ColdReset* is asserted for at least 64K (2^^) MasterClock cycles 
after the assertion of VCCOk. Once the processor reads the boot- 
time mode control serial data stream, ColdReset* can be 
deasserted. ColdReset* must be deasserted synchronously with 
MasterClock. 

4. The deassertion of ColdReset* synchronizes the rising edges of 
SClock and TClock with the rising edge of the next MasterClock, 
aligning SClock, TClock, and RCIock (which is 90 degrees ahead 
of phase with SClock and TClock) of all processors in a 
multiprocessor system. However, these clocks are only 
guaranteed to be stabilized 64 MasterClock cycles after 
ColdReset* is deasserted. 

5. After ColdReset* is deasserted synchronously and SClock, 
TClock, and RCIock have stabilized. Reset* is deasserted to allow 
the processor to begin running. (Reset* must be held asserted for 
at least 64 MasterClock cycles after the deassertion of 
ColdReset*.) Reset* must be deasserted s5mchronously with 
MasterClock. 

NOTE: ColdReset* must be asserted when VCCOk asserts. The 
behavior of the processor is undefined if VCCOk asserts while 
ColdReset* is deasserted. 
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Cold Reset 

A cold reset can begin anytime after the processor has read the 
initialization data stream, causing the processor to start with the Reset 
exception. For information about saving processor states, see the 
description of the Reset exception in Chapter 5. 

A cold reset requires the same sequence as a power-on reset except that the 
power is presumed to be stable before the assertion of the reset inputs and 
the deassertion of VCCOk. 

To begin the reset sequence, VCCOk must be deasserted for a minimum 
of at least 64 MasterClock cycles before reassertion. 

Wann Reset 

To execute a warm reset, the Reset* input is asserted sjmchronously with 
MasterClock. It is then held asserted for at least 64 MasterClock cycles 
before being deasserted synchronously with MasterClock. The processor 
internal clocks, PClock and SClock, and the System interface clocks, 
TClock and RClock, are not affected by a warm reset. The boot-time 
mode control serial data stream is not read by the processor on a warm 
reset. A warm reset forces the processor to start with a Soft Reset 
exception. For information about saving processor states, see the 
description of the Soft Reset exception in Chapter 5. 

The master clock output, MasterOut, can be used to generate any reset- 
related signals for the processor that must be synchronous with 
MasterClock.^ 

After a power-on reset, cold reset, or warm reset, all processor internal 
state machines are reset, and the processor begins execution at the reset 
vector. All processor internal states are preserved during a warm reset, 
although the precise state of the caches depends on whether or not a cache 
miss sequence has been interrupted by resetting the processor state 
machines. 



t Since MasterOut is undefined until after the serial PROM is read, reset logic must not 
depend on MasterOut before the boot PROM is read. 
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9.3 Initialization Sequence 

The boot-mode initialization sequence begins immediately after VCCOk 
is asserted. As the processor reads the serial stream of 256 bits through the 
Modein pin, the boot-mode bits initialize all fimdamental processor 
modes (the signals used are described in Chapter 8). 

The initialization sequence is listed below. 

1. The system deasserts the VCCOk signal. The ModeClock output 
is held asserted. 

2. The processor synchronizes the ModeClock output at the time 
VCCOk is asserted. The first rising edge of ModeClock occurs 
256 MasterClock cycles after VCCOk is asserted. 

3. Each bit of the initialization stream is presented at the Modein pin 
after each rising edge of the ModeClock. The processor samples 
256 initialization bits from the Modein input. 

Figures 9-1, 9-2, and 9-3 on the next three pages show the tuning diagrams 
for the power-on, warm, and cold resets. 
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9.4 Boot-Mode Settings 

Table 9-1 lists the processor boot-mode settings. The following rules apply 
to the boot-mode settings listed in this table: 

• Bit 0 of the stream is presented to the processor when VCCOk 
is first asserted. 

• Selecting a reserved value results in undefined processor 
behavior. 

• Bits 65 to 255 are reserved bits. 

• Zeros must be scanned in for all reserved bits. 



Table 9-1 Boot-Mode Settings 



Serial Bit 


Value 


Mode Setting 




BlkOrder: Secondary Cache Mode block read response ordering 


0 


0 


Sequential ordering 




1 


Subblock ordering 




EIBParMode: Specifies nature of System interface check bus 


1 


0 


Single error correcting, double error detecting (SECDED) error 
checking and correcting mode 




1 


Byte parity 




EndBIt: Specifies byte ordering 


2 


0 


Little-endian ordering 




1 


Big-endian ordering 




DShMdDis: Dirty shared mode; enables the transition to dirty shared state 




on a successful processor update 


3 


0 


Dirty shared mode enabled 




1 


Dirty shared mode disabled 




NoSCMode: Specifies presence of secondary cache 


4 


0 


Secondary cache present 




1 


No secondary cache present 




SysPort: System Interface port width, bit 6 most significant 


5:6 


0 


64 bits 




1-3 


Reserved 




SC64BitMd: Secondary cache interface port width 


7 


0 


128 bits 




1 


Reserved 




EISpltMd: Specifies secondary cache organization 


8 


0 


Secondary cache unified 




1 


Secondary cache split 
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Table 9-1 (cont.) Boot-Mode Settings 



Serial Bit 


Value 


Mode Setting 




SCBlkSz: Secondary cache line length, bit 10 most significant 




0 


4 words 


9:10 


1 


8 words 




2 


16 words 




3 


32 words 




XmitDatPat: System interface data rate, bit 14 most significant 




0 


D 




1 


DDx 




2 


DDxx 




3 


DxDx 


11:14 


4 


DDxxx 




5 


DDxxxx 




6 


DxxDxx 




7 


DDxxxxxx 




8 


DxxxDxxx 




9-15 


Reserved 




SysCkRatio: PClock to SClock divisor, frequency relationship between 
SClock, RClock, and TClock and PClock, bit 17 most significant 




0 


Divide by 2 


15:17 


1 
2 


Divide by 3 
Divide by 4 




3 


Divide by 6 (R4400 processor only) 




4 


Divide by 8 (R4400 processor only) 




5-7 


Reserved 


18 


SIMasterMd: Master/Checker Mode (see mode bit 42); used in R4400 only. 


19 


TimlntDis: Timer Interrupt enable allows timer interrupts, otherwise the 
interrupt used by the timer becomes a general purpose interrupt 


0 


Timer Interrupt enabled 




1 


Timer Interrupt disabled 


20 


PotUpdDis: Potential update enable allows potential updates to be issued. 
Otherwise, only compulsory updates are issued 


0 


Potential updates enabled 




1 


Potential updates disabled 


21:24 


TWrSUp: £ 
24 most sig 
0-2 


)econdary cache write deassertion delay, T^rSup ^ PCycles, bit 
nificant 
Undefined 




3-15 


Number of PClock cycles: Min 3, Max 15 
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Table 9-1 (cont.) Boot-Mode Settings 



Serial Bit 


Value 


Mode Setting 




TWr2Dly: Secondary cac 
26 most significant 

0 

1-3 


:he write assertion delay 2, Tyy]-2Dly ^ PCycles, bit 
Undefined 

Number of PClock cycles: Min 1, Max 3 


27:28 


TWrlDly: Secondary ca 
28 most significant 

0 

1-3 


che write assertion delay 1, T^yriDly ^ PCycles, bit 
Undefined 

Number of PClock cycles; Min 1, Max 3 


29 


TWrRc: Secondary each 
1 cycle 

0 
1 


e write recovery time, TvVrRc PCycles, either 0 or 

0 cycle 

1 cycle 


30:32 


TDis; Secondary cache < 

0-1 

2-7 


disable time, Tpjg in PCycles, bit 32 most significant 
Undefined 

Number of PClock cycles: Min 2, Max 7 


33:36 


TRd2Cyc: Secondary cac 
significant 

0-1 

2-15 


:he read cycle time 2, TRdCyc2 ^ PCycles, bit 36 most 
Undefined 

Number of PClock cycles: Min 2, Max 15 


37:40 


TRdlCyc: Secondary cac 
significant 

0-3 
4-15 


:he read cycle time 1, TRdCycl ^ PCycles, bit 40 most 
Undefined 

Number of PClock cycles: Min 4, Max 15 


41 


NoMPmode: Secondar)/ 
0 

1 


' cache line is not invalidated 
NoMPmode off: after a secondary cache miss, the 
existing valid cache line is invalidated (following 
writeback if necessary) 

NoMPmode on: after a secondary cache miss, the 
existing valid cache line is not invalidated. 
Available on the R4000SC and R4400SC, to 
improve performance. 
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Table 9-1 (cont.) Boot-Mode Settings 



Serial Bit 




Value 


Mode Setting 


42 


SCMasterMd 

description ol 
SCMasterMd 
(Bit 42) 


: selects the t 
mode bit 18) 
SIMasterMd 
(Bit 18) 


Ype of Master/Checker mode (also see 
. Used in R4400 only. 

Mode 


0 


0 


Complete Master 

(required for single-chip operation) 


1 


1 


Complete Listener 

(paired with Complete Master) 


1 


0 


System Interface Master 
(SIMaster) 


0 


1 


Secondary Cache Master 
(SCMaster, paired with SIMaster) 


43:45 


0 


Reserved 


46 


Pkgl79: R40C 

0 

1 


0 Processor Package type 
Large (447 pin) 
Small (179 pin) 


47:49 


CycDivisor: 

power mode. 

clock is divid( 

significant. 

0 

1 

2 

3 

4-7 


rhis mode determines the clock divisor for the reduced 

When the RP bit in the Status register is set to 1, the pipeline 
;d by one of the following values. Bit 49 is the most 

Divide by 2 
Divide by 4 
Divide by 8 
Divide by 16 
Reserved 


50:52 


Drv0_50, Drv 

period. Bit 52 

reserved. 

1 

2 

4 


0_75, Drvl_00: Drive the outputs out in n x MasterClock 
is the most significant. Combinations not defined below are 

Drive at 0.50 x MasterClock period 
Drive at 0.75 x MasterClock period 
Drive at 1.00 x MasterClock period 


53:56 


InitP: Initial ^ 
and switching 

0 

1-14 
15 


values for the state bits that determine the pull-down Ai/ At 
; speed of the output buffers. Bit 53 is the most significant. 

Fastest pull-down rate 

Intermediate pull-down rates 

Slowest puU-down rate 
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Table 9-1 (cont.) Boot-Mode Settings 



Serial Bit 


Value 


Mode Setting 


57:60 


InitN: Inib 
switching s 
0 

1-14 
15 


al values for the state bits that determine the pull-up Ai/At and 
peed of the output buffers. Bit 57 is the most significant. 

Slowest pull-up rate 

Intermediate puU-up rates 

Fastest pull-up rate 


61 


EnblDPLL 

Ai/ At and s 

(J 
1 


R: Enables the negative feedback loop that determines the 
witching speed of the output buffers during ColdReset. 

Disable Ai/At mechanism 

Enable Ai/At mechanism 


62 


EnblDPLL 

and switch 
normal ope 

0 
1 


: Enables the negative feedback loop that determines the Ai/At 
ng speed of the output buffers during ColdReset and during 
-ration. 

Disable Ai/ At control mechanism 
Enable Ai/At control mechanism 


63 


DsblPLL: 1 

and produc 

0 
1 


Disables the phase-locked loops (PLLs) that match MasterClock 
:e RClock, TClock, SClock, and the internal clocks. 

Enable PLLs 

Disable PLLs 


64 


SRTristate 

0 
1 


• Controls when output-only pins are tristated 
Only when ColdReset* is asserted 
When Reset* or ColdReset* are asserted 


65:255 


K.eserved. Scan in zeros. 
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This chapter describes the clock signals ("clocks") used in the R4000 
processor and the processor status reporting mechanism. 

The subject matter includes basic system clocks, system tuning 
parameters, connecting clocks to a phase-locked system, connecting clocks 
to a system without phase locking, and processor status outputs. 



MIPS R4000 Microprocessor User's Manual 



227 



Chapter 10 



10.1 Signal Terminology 

The following terminology is used in this chapter (and book) when 
describing signals: 

• Rising edge indicates a low-to-high transition. 

• Falling edge indicates a high-to-low transition. 

• Clock-to-Q delay is the amount of time it takes for a signal to 
move from the input of a device (clock) to the output of the 
device (Q). 

Figures 10-1 and 10-2 illustrate these terms. 



single clocl< cycle 




transition low-to-high 

transition 



Figure 10-1 Signal Transitions 



data out 



rx_r"L_n 



clock input 




Figure 10-2 Clock-to-Q Delay 
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10.2 Basic System Clocks 

The various clock signals used in the R4000 processor are described below, 
starting with MasterClock, upon which the processor bases all internal 
and external clocking. 

MasterClock 

The processor bases all internal and external clocking on the single 
MasterClock input signal. The processor generates the clock output 
signal, MasterOut, at the same frequency as MasterClock and aligns 
MasterOut with MasterClock, if Syncin is connected to SyncOut. 



MasterOut 

The processor generates the clock output signal, MasterOut, at the same 
frequency as MasterClock and aligns MasterOut with MasterClock, if 
Syncin is connected to SyncOut. MasterOut clocks external logic, such as 
the reset logic. 



SyncIn/SyncOut 

The processor generates SyncOut at the same frequency as MasterClock 
and aligns Syncin with MasterClock. 

SyncOut must be connected to Syncin either directly, or through an 
external buffer. The processor can compensate for both output driver and 
input buffer delays (and, when necessary, delay caused by an external 
buffer) when aligning Syncin with MasterClock. Figure 10-7 gives an 
illustration of SyncOut connected to Syncin through an external buffer. 

PClock 

The processor generates an internal clock, PClock, at twice the frequency 
of MasterClock and precisely aligns every other rising edge of PClock 
with the rising edge of MasterClock. 

All internal registers and latches use PClock. 
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SClock 

The R4000 processor divides PClock by 2, 3, or 4 (as programmed at boot- 
mode initialization) to generate the internal clock signal, SClock. The 
R4400 processor divides PClock by 2, 3, 4, 6 or 8 (as programmed at boot- 
mode initialization) to generate SClock. The processor uses SClock to 
sample data at the system interface and to clock data into the processor 
system interface output registers. 

The first rising edge of SClock, after ColdReset* is deasserted, is aligned 
with the first rising edge of MasterClock. 



TClock 

TClock (transmit clock) clocks the output registers of an external agent,^ 
and can be a global system clock for any other logic in the external agent. 

TClock is the same frequency as SClock. When Syncin is shorted to 
SyncOut, the edges of TClock align precisely with the edges of SClock 
and MasterClock. 

When a delay is added between Syncin and SyncOut, the TClock at the 
pins leads SClock (and thus MasterClock) by the same amount of delay. 

If the delay between Syncin and SyncOut is matched to an external delay 
between TClock at the processor and TClock at the external logic, the 
TClock at the external logic aligns to SClock and MasterClock. 



RClock 

The external agent uses RClock (receive clock) to clock its input registers. 
The processor generates RClock at the same frequency as TClock, but 
RClock always leads TClock and SClock by 25 percent of SClock cycle 
time. The relationship between RClock and TClock is independent of the 
delay between Syncin and SyncOut. 



PClock-to-SClock Division 

Figure 10-3 shows the clocks for a PClock-to-SClock division by 2; Figure 
10-4 shows the clocks for a PClock-to-SClock division by 4. 



t External agent is defined in Chapter 12. 
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Figure 10-3 Processor Clocks, PCIock- to-SClock Division by 2 
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Figure 10-4 Processor Clocks, PClock-fo-SClock Division by 4 
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10.3 System Timing Parameters 



As shown in Figures 10-3 and 10-4, data provided to the processor must be 
stable a minimum of tj^g nanoseconds (ns) before the rising edge of SClock 
and be held valid for a minimum of t^^ i^s after the rising edge of SClock. 



Processor data becomes stable a minimum of t^^ ns and a maximtma of 
too ns after the rising edge of SClock. This drive-time is the sum of the 
maximum delay through the processor output drivers together with the 
maximum clock-to-Q delay of the processor output registers. 



Certain processor inputs (specifically VCCOk, ColdReset*, and Reset*) 
are sampled based on MasterClock, while others (specifically, Status(7:0)) 
are output based on MasterClock. The same setup, hold, and drive-off 
parameters, t^g, t^H/ ^bM' ^IX)' shown in Figures 10-3 and 10-4, apply 
to these inputs and outputs, but they are measured by MasterClock 
instead of SClock. 



The processor aligns SyncOut, PClock, SClock, TClock, and RClock with 
internal phase-locked loop (PLL) circuits that generate aligned clocks 
based on SyncOut/ Syncln. By their nature, PLL circuits are only capable 
of generating aligned clocks for MasterClock frequencies within a limited 

range. 

Clocks generated using PLL circuits contain some inherent inaccuracy, or 
jitter; a clock aligned with MasterClock by the PLL can lead or trail 
MasterClock by as much as the related maximum jitter allowed by the 
individual vendor. 



Alignment 



to SClock 



Alignment 



to MasterClock 



Phase-Locked Loop (PLL) 
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10.4 Connecting Clocks to a Phase-Locked System 



When the processor is used in a phase-locked system, the external agent 
must phase lock its operation to a common MasterClock. In such a system, 
the delivery of data and data sampling have common characteristics, even 
if the components have different delay values. For example, transmission 
time (the amount of time a signal takes to move from one component to 
another along a trace on the board) between any two components A and B 
of a phase-locked system can be calculated from the following equation: 

Transmission Time - (SClock period) - (t^Q for A) - (t^g for B) - 
(Clock Jitter for A Max) - (Clock Jitter for B Max) 

Figure 10-5 shows a block-level diagram of a phase-locked system using 
the R4000 processor. 



MasterClock 



R4000 



External Agent 



MasterClock 



MasterClock 



SysCmd 



SysCmd 



SysAD 



SysAD 



SyncOut 



Syncin 



RCIock — 



TCIock - 





Figure 10-5 R4000 Processor Phase-Locked System 
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10.5 Connecting Clocks to a System without Phase Locking 

When the R4000 processor is used in a system in which the external agent 
cannot lock its phase to a common MasterClock, the output clocks RClock 
and TClock can clock the remainder of the system. Two clocking 
methodologies are described in this section: connecting to a gate-array 
device or connecting to discrete CMOS logic devices. 

Connecting to a Gate-Array Device 

When connecting to a gate-array device, both RClock and TClock are 
used within the gate-array. The gate array internally buffers RClock and 
uses this buffered version to clock registers that sample processor outputs. 

These sampling registers should be immediately followed by staging 
registers clocked by an internally buffered version of TClock. This 
buffered version of TClock should be the global system clock for the logic 
inside the gate array and the clock for all registers that drive processor 
inputs. Figure 10-6 is a block diagram of this circuit. 

Staging registers place a constraint on the sum of the clock-to-Q delay of 
the sample registers and the setup time of the staging registers inside the 
gate arrays, as shown in the following equation: 

Clock-to-Q Delay + Setup of Staging Register 

- (Maximum Clock Jitter for RClock) 

- (Maximum Delay Mismatch for Internal Clock > 0.25 (RClock period) 

Buffers on RClock and TClock) 



Figure 10-6 is a block diagram of a system without phase lock, using the 
R4000 processor with an external agent implemented as a gate array. 
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Figure 10-6 Gate-Array System without Phase Lock, using the R4000 Processor 
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In a system without phase lock, the transmission time for a signal from the 
processor to an external agent composed of gate arrays can be calculated 
from the following equation: 

Transmission Time = (75 percent of TClock period) - (tno for R4000) 
+ (Minimum External Clock Buffer Delay) 

- (External Sample Register Setup Time) 

- (Maximum Clock Jitter for R4000 Internal Clocks) 

- (Maximum Clock Jitter for RClock) 

The transmission time for a signal from an external agent composed of gate 
arrays to the processor in a system without phase lock can be calculated 
from the following equation: 

Transmission Time = (TClock period) - (t^g for R4000) 

- (Maximum External Clock Buffer Delay) 

- (Maximum External Output Register Clock-to-Q Delay) 

- (Maximum Clock litter for TClock) 

- (Maximum Clock Jitter for R4000 Internal Clocks) 
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Connecting to a CMOS Logic System 

The processor uses matched delay clock buffers to generate aligned clocks 
to external CMOS logic. A matched delay clock buffer is inserted in the 
SyncOut/SyncIn alignment path of the processor, skewing SyncOut, 
MasterOut, RClock, and TClock to lead MasterClock by the buffer delay 
amount, while leaving PClock aligned with MasterClock. 

The remaining matched delay clock buffers are available to generate a 
buffered version of TClock aligned with MasterClock. Alignment error 
of this buffered TClock is the sum of the maximum delay mismatch of the 
matched delay clock buffers, and the maximum clock jitter of TClock. 

As the global system clock for the discrete logic that forms the external 

agent, the buffered version of TClock clocks registers that sample 
processor outputs, as well as clocking the registers that drive the processor 
inputs. 

The transmission time for a signal from the processor to an external agent 
composed of discrete CMOS logic devices can be calculated from the 
following equation: 

Transmission Time - (TClock period) - (t^Q for R4000) 

- (External Sample Register Setup Time) 

- (Maximum External Clock Buffer Delay Mismatch) 

- (Maximum Clock Jitter for R4000 Internal Clocks) 

- (Maximum Clock Jitter for TClock) 

Figure 10-7 is a block diagram of a system without phase lock, employing 
the R4000 processor and an external agent composed of both a gate array 
and discrete CMOS logic devices. 
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Figure 10-7 Gate Array and CMOS System without Phase Lock, using the R4000 Processor 
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The transmission time for a signal from an external agent composed of 
discrete CMOS logic devices can be calculated from the following 
equation: 

Transmission Time = (TClock period) - (tug for R4000) 

- (Maximum External Output Register Clock-to-Q Delay) 

- (Maximum External Clock Buffer Delay Mismatch) 

- (Maximum Clock Jitter for R4000 Internal Clocks) 

- (Maximum Clock Jitter for TClock) 

In this clocking methodology, the hold time of data driven from the 
processor to an external sampling register is a critical parameter. To 
guarantee hold time, the minimum output delay of the processor, t^i^, 
must be greater than the sum of: 

minimum hold time for the external sampling register 
+ maximum clock jitter for R4000 internal clocks 
+ maximum clock jitter for TClock 

+ maximum delay mismatch of the external clock buffers 
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10.6 Processor Status Outputs 

The R4400 processor provides eight status outputs, Status(7:0), aligned 
with each rising edge of MasterClock. At time T (the first PCycle of 
MasterClock when status is examined) these status outputs indicate 
whether the machine was running or stalled during the previous T-2 and 
T-3 PCycles, as follows: 

• If the machine was stalled during the T-2 or T-3 PCycles, the 
status outputs indicate the type of stall which occurred (listed 
in Table 10-1). 

• If the machine was rimning during the T-2 or T-3 PCycles, the 
status outputs describe the type of instruction which occupied 
the WriteBack pipeline stage during the T-2 or T-3 PCycles, and 
which was successfully completed (listed in Table 10-1). 

• The status outputs also indicate if an instruction in the T-2 or 
T-3 PCycle was killed, and if so, states the cause (listed in 
Table 10-1. 

The Status(7:0) bits are treated as two fields, as follows: 

• The Status(7:4) field indicates the internal status of the 
processor during PCycle T-3. 

• The Status(3:0) bits indicate the internal status of the processor 
during the PCycle T-2. 
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Table 10-1 shows the encoding of processor's status for pins Status(7:4) or 
Status(3:0). 

Table 10-1 Encoding ofR4400 Processor Internal State hy Status(7:4) or Status(3:0) 



Status(7:4) or 
Status(3:0) 


Cycle 


Processor Internal Status 


0 


Run cycle 


Other integer instruction (not load/ store/ conditional 
branch. Includes ERET and Jump instructions.) 


1 


Run cycle 


Load 


2 


Run cycle 


Untaken conditional branch 


3 


Run cycle 


Taken conditional branch 


4 


Run cycle 


Store 


5 


Reserved 


6 


Stall cycle 


MP stall 


7 


Run cycle 


Integer instruction killed by slip 


8 


Stall cycle 


Other stall type 


9 


Stall cycle 


Primary instruction cache stall 


a 


Stall cycle 


Primary data cache stall 


b 


Stall cycle 


Secondary cache stall 


c 


Riin cycle 


Other floating-point instruction (not load, store, or 
conditional branch) 


d 


Run cycle 


Instruction killed by branch, jump, or ERET 


e 


Run cycle 


Instruction killed by exception 


f 


Run cycle 


Floating-point instruction killed by slip 
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This chapter describes in detail the cache memory: its place in the R4000 
memory organization, individual operations of the primary and 
secondary caches, cache interactions, and an example of a cache coherency 
request cycle. The chapter concludes with a description of R4000 
processor synchronization in a multiprocessor environment. 

This chapter uses the following terminology: 

• The primary cache may also be referred to as the P-cache. 

• The secondary cache may also be referred to as the S-cache. 

• The primary data cache may also be referred to as the D-cache. 

• The primary instruction cache may also be referred to as the 
I-cache. 

These terms are used interchangeably throughout this book. 
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11.1 Memory Organization 

Figure 11-1 shows the R4000 system memory hierarchy. In the logical 
memory hierarchy, caches lie between the CPU and main memory. They 
are designed to make the speedup of memory accesses transparent to the 
user. Each fimctional block in Figure 11-1 has the capacity to hold more 
data than the block above it. For instance, physical main memory has a 
larger capacity than the secondary cache. At the same time, each 
functional block takes longer to access than any block above it. For 
instance, it takes longer to access data in main memory than in the CPU 
on-chip registers. 
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Figure 11-1 Logical Hierarchy of Memory 
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The R4000 processor has two on-chip primary caches: one holds 
instructions (the instruction cache), the other holds data (the data cache). 
Off-chip, the R4000 processor supports a secondary cache on the R4000SC 
and MC models. 

11.2 Overview of Cache Operations 

As described earlier, caches provide fast temporary data storage, and they 
make the speedup of memory accesses transparent to the user. In general, 
the processor accesses cache-resident instructions or data through the 
following procedure: 

1. The processor, through the on-chip cache controller, attempts to access 
the next instruction or data in the primary cache. 

2. The cache controller checks to see if this instruction or data is present 
in the primary cache. 

• If the instruction/ data is present, the processor retrieves it. 
This is called a primary-cache hit. 

• If the instruction/ data is not present in the primary cache, the 
cache controller must retrieve it from the secondary cache or 
memory. This is called a primary-cache miss. 

3. If a primary-cache miss occurs, the cache controller checks to see if the 
instruction/data is in the secondary cache. 

• If the instruction/ data is present in the secondary cache, it is 
retrieved and written into the primary cache. 

• If the instruction/ data is not present in the secondary cache, it 

is retrieved as a cache line (a block whose size set in the Config 
register; see the section titled Variable-Length Cache Lines in 
this chapter for available cache line lengths) from memory and 
is written into both the secondary cache and the appropriate 
primary cache. 

4. The processor retrieves the instruction/ data from the primary cache 
and operation continues. 

It is possible for the same data to be in three places simultaneously: main 
memory, secondary cache, and primary cache. This data is kept consistent 
through the use of write back methodology; that is, modified data is not 
written back to memory imtil the cache line is replaced. 
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11.3 R4000 Cache Description 

As Figure 11-1 shows, the R4000 contains separate primary instruction and 
data caches. Figure 11-1 also shows that the R4000 supports a secondary 
cache that can be split into separate portions, one portion containing data 
and the other portion containing instructions, or it can be a joint cache, 
holding combined instructions and data. 

This section describes the organization of on-chip primary caches and the 
optional off-chip secondary cache. Table 11-1 lists the cache and cache 
coherency support for the three R4000 models. 



Table 11-1 R4000 Cache and Coherency Support 



R4000 
Model 


Support 
Primary 
Cache? 


Support 
Secondary 
Cache? 


Support 
Cache 
Coherency? 


R4000PC 


Yes 


No 


No 


R4000SC 


Yes 


Yes 


No 


R4000MC 


Yes 


Yes 


Yes 



Figure 11-2 provides block diagrams of the three R4000 models: 

• R4000PC, which supports only the primary cache 

• R4000SC and R4000MC, which support both primary and 
secondary caches 
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Figure 11-2 Cache Support in the R4000PC, R4000SC, and R4000MC 
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Secondary Cache Size 

Table 11-2 lists the range of secondary cache sizes. The secondary cache is 
user-configurable at boot time through the boot-mode bits (see Chapter 9); 
it can be a joint cache, containing both data and instructions in a single 
cache, or split into separate data and instruction caches. 



Table 11-2 Secondary Cache Sizes 



Cache 


Minimum Size 


Maximum Size 


Secondary Joint Cache 


128 Kbytes 


4 Mbytes 


Secondary Split I-Cache 


128 Kbytes 


2 Mbytes 


Secondary Split D-Cache 


128 Kbytes 


2 Mbytes 



Variable-Length Cache Lines 

A cache line is the smallest unit of information that can be fetched from the 
cache, and that is represented by a single tag.^ A primary cache line can 
be either 4 or 8 words in length; a secondary cache line can be either 4, 8, 
16, or 32 words in length. Primary cache line length is set in the Config 
register; see Chapter 4 for more information. Secondary cache line length 
is set at boot time through the boot-mode bits, as described in Chapter 9. 

Upon a cache miss in both primary and secondary caches, the missing 
secondary cache line is loaded first from memory into the secondary 
cache, whereupon the appropriate subset of the secondary cache line is 
loaded into the primary cache. 

The primary cache line length can never be longer than that of the 
secondary cache; it must always be less than or equal to the secondary 
cache line length. This means the secondary cache cannot have a 4-word 
line length while the primary cache has an 8-word line length. 

Cache Organization and Accessibility 

This section describes the organization of the primary and secondary 
caches, including the manner in which they are mapped, the addressing 
(either virtual or physical) used to index the cache, and composition of the 

cache lines. The primary instruction and data caches are indexed with a 
virtual address (VA), while the secondary cache is indexed with a physical 
address (PA). 



t Primary and secondary cache tags are described in the following sections. 
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Organization of the Primary Instruction Cache (I-Cache) 

Each line of primary I-cache data (although it is actually an instruction, it 
is referred to as data to distinguish it from its tag) has an associated 26-bit 
tag that contains a 24-bit physical address, a single valid bit, and a single 
parity bit. Byte parity is used on I-cache data. 

The R4000 processor primary I-cache has the following characteristics: 

• direct-mapped 

• indexed with a virtual address 

• checked with a physical tag 

• organized with either a 4-word (16-byte) or 8-word (32-byte) 
cache line. 

Figure 11-3 shows the format of an 8-word (32-byte) primary I-cache line. 



25 24 23 
P V 



PTag 
V 

Data 
P 

DataP 



1 



PTag 



Physical tag (bits 35:12 of the physical address) 
Valid bit 
Cache data 

Even parity for the PTag and V fields 
Even parity; 1 parity bit per byte of data 



71 



64 63 



DataP 


Data 


DataP 


Data 


DataP 


Data 


DataP 


Data 



64 



Figure 11-3 R4000 8-Word Primary I-Cache Line Format 
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Organization of the Primary Data Cache (D-Cache) 

Each line of primary D-cache data has an associated 29-bit tag that 
contains a 24-bit physical address, 2-bit cache line state, a write-back bit, a 
parity bit for the physical address and cache state fields, and a parity bit 
for the write-back bit. Byte parity is used on D-cache data. 

The R4000 processor primary D-cache has the following characteristics: 

• write-back 

• direct-mapped 

• indexed with a virtual address 

• checked with a physical tag 

• organized with either a 4-word (16-byte) or 8-word (32-byte) 
cache line. 

Figure 11-4 shows the format of a 8-word (32-byte) primary D-cache line. 



28 


27 


26 


25 24 


23 






0 


W 


W 


P 


CS 


PTag 1 


1 


1 


1 


2 




71 


24 

64 63 


0 



DataP 


Data 


DataP 


Data 


DataP 


Data 


DataP 


Data 



8 



64 



W Even parity for the write-back bit 

W Write-bacl< bit (set if cache iine has been written) 

P Even parity for the PTag and CS fields 

CS Primary cache state: 

0 = Invalid In ail R4000 configurations 

1 = Shared (either Clean or Dirty) in R4000MC configuration only 

2 = Clean Exclusive in R4000SC and IVIC configurations only 

3 = Dirty Exclusive in all R4000 configurations 
PTag Physical tag (bits 35:1 2 of the physical address) 
DataP Even parity for the data 

Data Cache data 

Figure 11-4 R4000 8-Word Primary Data Cache Line Format 
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In all R4000 processors, the W (write-back) bit, not the cache state, 
indicates whether or not the primary cache contains modified data that 
must be written back to memory or to the secondary cache. 

Accessing the Primary Caches 

Figure 11-5 shows the virtual address (VA) index into the primary caches. 
Each instruction and data cache range in size from 8 Kbytes to 32 Kbytes; 
therefore, the nirmber of virtual address bits used to index the cache 
depends on the cache size. For example, VA(12:4) accesses a 8-Kbyte page 
tag in a cache with a 4-word line (VA(12) addresses 8 Kbytes and VA(4) 
provides quadword resolution); similarly, VA(14:5) accesses an 8-word 
tag: VA(5) provides octalword access in a 32-Kbyte cache (VA(14) 
addresses 32 Kbytes). 



VA(12:n*) for 8 Kbyte 
to 

VA(14:n*) for 32 Kbyte 



Tags 



Tag line 



t t t 

W W State Tag P 



*n = 4 for 4-word lines 
n = 5 for 8-word lines 




Data 



Figure 11-5 Primary Cache Data and Tag Organization 



MIPS R4000 Microprocessor User's Manual 



251 



Chapter 11 



Organization of the Secondary Cache 

Each secondary cache line has an associated 19-bit tag that contains bits 
35:17 of the physical address, a 3-bit primary cache index, VA(14:12), and 
a 3-bit cache line state. These 25 bits are protected by a 7-bit ECC code. 

The secondary cache is accessible to the processor and to the system 
interface; by setting the appropriate boot-mode bits, it can be configured 
at chip reset as a joint cache, or as separate I- and D-caches. 

Figure 11-6 shows the format of the R4000 processor secondary-cache line. 
The size of the secondary cache line is set in the SB field of the Config 
register. 



31 25 


24 22 21 19 


18 




0 


ECC 


CS 


PIdx 


STag 1 


7 


3 


3 


19 



ECC ECC for secondary tag 
CS Secondary-cache state 

0 = Invalid 

1 = reserved 

2 = reserved 

3 = reserved 

4 = Clean Exclusive 

5 = Dirty Exclusive 

6 = Shared 

7 = Dirty Shared 

PIdx Primary cache index (bits 14:12 of the virtual address) 
STag Physical tag (bits 35:1 7 of the physical address) 



Figure 11-6 R4000 Secondary Cache Line Format 
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The R4000 processor secondary cache has the following characteristics: 

• write-back 

• direct-mapped 

• indexed with a physical address 

• checked with a physical tag 

• organized with either a 4-word (16-byte), 8-word (32-byte), 
16-word (64-byte), or 32-word (128-byte) cache line. 

The secondary cache state (CS) bits indicate whether: 

• the cache line data and tag are valid 

• the data is potentially present in the caches of other processors 

(shared versus exclusive) 

• the processor is responsible for updating main memory (clean 
versus dirty). 

The PIdx field provides the processor with an index to the virtual address 
of primary cache lines that may contain data from the secondary cache 
line. 

The PIdx field also detects a cache alias. Cache aliasing occurs when the 
physical address tag matches during a data reference to the secondary 
cache, but the PIdx field does not match in the virtual address. This 
indicates that the cache reference was made from a different virtual 
address than the one that created the secondary-cache line, and the 
processor signals a Virtual Coherency exception. 
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Accessing the Secondary Cache 

Figure 11-7 shows the physical address (PA) index into the secondary 
cache. The secondary cache ranges in size from 128 Kbytes to 4 Mbytes, 
and the number of physical address bits used to index the cache depends 
upon the cache size. For instance, PA(16:4) accesses the tags in a 128-Kbyte 
secondary cache with 4- word lines; PA(21:5) accesses the tags in a 4-Mbyte 
secondary cache with 8-word lines. 

The processor always uses PA(35:17) from the secondary cache, regardless 
of the S-cache size. This makes it important to initialize all secondary 
cache tag address bits with a valid physical address, regardless of the size 
of the S-cache. 



Tags 



Data 



Tag line 




PA(16:n*) for 128 Kbyte 
to 

PA(21 :n*) for 4 Mbyte 



PA(16:n*) 



to 

PA(21 :n*) 



Data line 





t 

ECC 



t 

cs 



PIdx Tag 



*n = 4 for 4-word lines 
n = 5 for 8-word lines 
n = 6 for 1 6-word lines 
n = 7 for 32-word lines 




t 

Data 



Figure 11-7 Secondary Cache Data and Tag Organization 
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11.4 Cache States 

The four terms below are used to describe the state of a cache line: 

• Exclusive: a cache line that is present in exactly one cache in 
the system is exclusive, and may be in one of the exclusive 
states. 

• Dirty: a cache line that contains data that has changed since it 
was loaded from memory is dirty, and must be in one of the 
dirty or shared states. 

• Clean: a cache line that contains data that has not changed 
since it was loaded from memory is clean, and may be in one 
of the clean states. 

• Shared: a cache line that is present in more than one cache in 

the system. 

Each primary and secondary cache line in the R4000 system is in one of the 
states described in Table 11-3. Table 11-3 also lists with the t5rpes of cache 
and the R4000 models in which the various states may be foimd. 



Table 11-3 Cache States 



Cache 
Line 
State 


Description 


Where the 
State is 
Used 


Available on 
the Following 
R4000 Models 


Invalid 


A cache line that does not contain valid 
information must be marked invalid, and 
cannot be used. For example, a cache line is 
marked invalid if the same information, 
located in another cache, is modified. A cache 
line in any other state than invalid is assumed 
to contain valid information. 


Primary or 
Secondary 
Cache 


R4000PC 
R4000SC 
R4000MC 


Shared 


A cache line that is present in more than one 
cache in the system is shared. 


Primary or 
Secondary 
Cache 


R4000MC 
only 


Dirty 
Shared 


A dirty shared cache line contains valid 
information and can be present in another 
cache. This cache line is inconsistent with 
memory and is owned by the processor (see the 
section titled Cache Line Ownership in this 
chapter). 


Secondary 
cache only 


R4000MC 
only 
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Table 11-3 (cont.) Cache States 



Cache 
Line 
State 


Description 


Where the 
State is 
Used 


Available on 
the Following 
R4000 Models 


Clean 
Exclusive 


A clean exclusive cache line contains valid 
information and this cache line is not present in 
any other cache. The cache line is consistent 
with memory and is not owned by the 
processor (see the section titled Cache Line 
Ownership in this chapter). 


Primary or 
Secondary 
Cache 


R4000SC 
R4000MC 


Dirty 
Exclusive 


A dirty exclusive cache line contains valid 
information and is not present in any other 
cache. The cache line is inconsistent with 
memory and is owned by the processor (see the 
section titled Cache Line Ownership in this 
chapter). 


Primary or 
Secondary 
Cache 


R4000PC 

R4000SC 
R4000MC 



Primary Cache States 

Each primary data cache line is in one of the following states: 

• invalid 

• shared 

• clean exclusive 

• dirty exclusive 

Each primary instruction cache line is in one of the following states: 

• invalid 

• valid 

Secondary Cache States 

Each secondary cache line is in one of the following states: 

• invalid 

• shared 

• dirty shared 

• clean exclusive 

• dirty exclusive 
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Mapping States Between Caches 

Secondary cache states correspond, or map, to primary cache states (this 
mapping is listed in Table 11-6, later on in this chapter). For example, the 
secondary cache shared and dirty shared states map to the primary cache 
shared state. 

Therefore, when the primary cache line is filled from the secondary cache, 
the state of the secondary cache line is also mapped into the primary cache; 
in the case described above, the shared or dirty shared secondary state is 
mapped to the shared primary cache state. 

As shown in Figure 11-8, a primary cache line in the R4000PC model can 
be in either an invalid or dirty exclusive state. In the R4000SC model, a 
primary cache line can be in the invalid, clean exclusive, or dirty exclusive 
state. In the R4000MC model, the primary cache line can be invalid, clean 
exclusive, dirty exclusive, or shared. 



R4000PC 



R4000SC 



R4000MC 



Invalid State 



Dirty Exclusive State 



Invalid State 



Invalid State 



Clean Exclusive State 



Clean Exclusive State 



Dirty Exclusive State 



Dirty Exclusive State 



Shared State 



Figure 11-8 Primary Cache States Available to Each Type of Processor 
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11.5 Cache Line Ownership 

A processor becomes the owner of a cache line after it writes to that cache 
line (that is, by entering the dirty exclusive or dirty shared state), and is 
responsible for providing the contents of that line on a read request. 
There can only be one owner for each cache line. 

The ownership of a cache line is set and maintained through the rules 
described below. 

• A processor assumes ownership of the cache line if the state of 
the secondary cache line is dirty shared or dirty exclusive. 

• A processor that owns a cache line is responsible for writing 
the cache line back to memory if the line is replaced during the 
execution of a Write-back or Write-back Invalidate cache 
instruction. For read responses to a processor coherent read 
request (both of these terms are defined in Chapter 12) in which 
the data is returned in the dirty shared or dirty exclusive state, 
the cache state is set when the last word of read response data 
is returned. Therefore, the processor assumes ownership of the 
cache line when the last word of response data is returned. 

• For processor coherent write requests, the state of the cache 
line changes to invalid if the cache line is replaced, or to either 
dean exclusive or shared if the cache line is retained (provided 
the cache line was written back to memory). In either case, the 
cache state transition occurs when the last word of write data is 
transmitted to the external agent. Therefore, the processor 
gives up ownership of the cache line when the last word of 
write data is transmitted to the external agent (Chapter 12 
defines external agent). 

• Memory always owns clean cache lines. 

• The processor gives up ownership of a cache line when the 
state of the cache line changes to invalid, shared, or clean 
exclusive. 
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11.6 Cache Write Policy 

The R4000 processor manages its primary and secondary caches by using 
a write-back policy; that is, it stores write data into the caches, instead of 
writing it directly to memory.^ Some time later this data is independently 
written into memory. In the R4000 implementation, a modified cache line 
is not written back to memory imtil the cache line is replaced either in the 
coiirse of satisfying a cache miss, or diiring the execution of a Write-back 
CACHE instruction. 

If a primary cache line is in either the dirty exclusive or shared state and 
that cache line has been modified (the W bit is set), the processor writes 
this cache line back to memory (or the secondary cache, if it is present) 
when the line is replaced, either in the course of satisfying a cache miss or 
during the execution of a Write-back or Write-back Invalidate CACHE 
instruction. 

If a secondary cache line is in either the dirty exclusive or dirty shared 
state, the processor writes this cache line back to memory when the line is 
replaced, either in the course of satisfying a cache miss or during the 
execution of a Write-back CACHE instruction. 

Many systems, in particular multiprocessor systems, or systems 
employing I/O devices that are capable of DMA, require the system to 
behave as if the caches are always consistent both with memory and with 
each other. Schemes for maintaining consistency between more than one 
cache, or between caches and memory, are defined by the system cache 
coherency protocols (see the section titled Cache Coherency Overview 
later in this chapter). In the R4000 system, when the content of a cache line 
is inconsistent with memory, it is classified as dirty and is written back to 
memory according to the rules of the cache write-back policy. 

When the processor writes a cache line back to memory, it does not 
ordinarily retain a copy of the cache line, and the state of the cache line is 
I changed to invalid. However, there is one exception. The processor 

retains a copy of the cache line if a cache line is written back by the Hit 
Write-back cache instruction. The processor changes the retained cache 
line state to either clean exclusive if the secondary cache state was dirty 
exclusive before the write, or shared if the secondary cache state was dirty 
shared before the write. The processor signals this line retention during a 
write by setting SysCmd(2) to a 1, as described in Chapter 12. 



t An alternative to this is a write-through cache, in which information is written 
simultaneously to cache and memory. 
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11.7 Cache State Transition Diagrams 

The following sections describe the cache state diagrams that illustrate the 
cache state transitions for both the primary and secondary caches. Figures 
11-9 and 11-10 are state diagrams of the primary and secondary caches, 
respectively. 

When an external agent supplies a cache line, the initial state of the cache 
line is specified by the external agent (see Chapter 12 for a definition of an 
external agent). Otherwise, the processor changes the state of the cache 
line during one of the following events: 

• A store to a clean exclusive cache line causes the state to be 
changed to dirty exclusive in both the primary and secondary 
caches. 

• A store to a shared cache line — that is a line marked shared in 
the primary cache and either shared or dirty shared in the 
secondary cache — causes the processor to issue either an 
invalidate request or an update request (depending on the 
coherency attribute in the TLB entry for the page containing 
the cache line). And update page attribute causes an update 
request to be issued; a sharable page attribute causes an 
invalidate request to be issued. 

- Upon successful completion of an invalidate, the 
processor completes the store and changes the state of the 
cache line to dirty exclusive in both the primary and 
secondary caches. 

- Upon successful completion of an update, the processor 
completes the store and changes the state of the cache line 
to shared in the primary cache and dirty shared in the 
secondary cache if dirty shared mode is enabled. Dirty 
shared mode is programmable through the boot-time 
mode control interface (see Chapter 9 for a description of 
boot mode bits). If dirty shared mode is not enabled, the 
state of the primary and secondary caches are left in a 
shared state, after successful completion of an update. 

• A store to a dirty exclusive line remains in a dirty exclusive 
state. 

These state diagrams do not cover the initial state of the system since the 
initial state is system dependent. 
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Figure 11-9 Primary Data Cache State Diagram 

If the system is in no-secondary-cache mode, the cache state provided by 
the system is ignored, and the primary data cache state is set to dirty 
exclusive. 
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hit, 
hit 



Figure 11-10 Secondary Cache State Diagram 
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The state of a secondary cache line is provided by the external agent and 
is set as follows: 

Case 1 . If the cache line is not present in another cache, it should be loaded 
in the clean exclusive state. 

Case 2. If the cache line is retained by another cache and the state of the 
line in that cache remains shared or dirty shared, the line should 
be loaded in the shared state. 

Case 3. If the cache line is retained by another cache and the cache 

relinquishes ownership to the processor making the read request, 
the line should be returned in the dirty shared state. 

Case 4. If the cache line is retained by another cache and ownership is 
relinquished to memory, the line should be loaded in the shared 
state. 

Case 5. If the cache line is relinquished by another cache and ownership 
is transferred to the processor making the read request, the line 
should be loaded in the dirty exclusive or dirty shared state. 

For case 1, if the refill occurs on a store miss, the processor changes the 

cache line state to dirty exclusive. For each of the remaining cases listed 
above, the R4000 processor passes the state received from the external 
agent to the secondary cache. 

The invalid state is never used for a refill. Software, however, should 
initialize the secondary cache to the invalid state after the system is 
powered up. 
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11.8 Cache Coherency Overview 

Systems using more than one R4000MC processor must have a mechanism 
to maintain data consistency throughout a multi-cache, multiprocessor 
system. This mechanism is called a cache coherency protocol. 



Cache Coherency Attributes 

Cache coherency attributes are necessary to ensiire the consistency of data 
throughout the multitude of caches that can be present in the 
multiprocessor environment. 

Bits in the translation look-aside buffer (TLB) control coherency on a per- 
page basis. Specifically, the TLB contains 3 bits per entry that provide five 
possible coherency attributes; they are listed below and described more 
fully in the following sections. 

• uncached (R4000PC, R4000SC, R4000MC) 

• noncoherent (R4000PC, R4000SC, R4000MC) 

• sharable (R4000MC only, with secondary cache) 

• update (R4000MC only, with secondary cache) 

• exclusive (R4000MC only, with secondary cache) 

Only uncached or noncoherent attributes can be used by an R4000PC or an 
R4000SC processor. 

Table 11-4 stmrraiarizes the behavior of the processor on load misses, store 
misses, and store hits to shared cache lines for each of the five coherency 
attributes listed above. The following sections describe in detail the five 
coherency attributes. 



Table 11-4 Coherency Attributes and Processor Behavior 



Attribute 


Load Miss 


Store Miss 


Store Hit Shared 


Uncached 


Main memory read 


Main memory write 


NA 


Noncoherent 


Noncoherent read 


Noncoherent read 


Invalidate ^ 


Exclusive 


Coherent read exclusive 


Coherent read exclusive 


Invalidate ^ 


Sharable 


Coherent read 


Coherent read exclusive 


Invalidate 


Update 


Coherent read 


Coherent read 


Update 



t These should not occur imder normal circumstances. 
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Uncached 

Lines within an uncached page are never in a cache. When a page has the 
uncached coherency attribute, the processor issues a doubleword, partial- 
doubleword, word, or partial-word read or write request directly to main 
memory (bjrpassing the cache) for any load or store to a location within 
that page. 

Noncoherent 

Lines with a noncoherent attribute can reside in a cache; a load or store miss 
causes the processor to issue a noncoherent block read request to a 
location within the cached page. 

Sharable 

Lines with a sharable attribute must be in a multiprocessor environment 
(using the R4000MC), since shared lines can be in more than one cache at 
a time. When the coherency attribute is sharable, the processor operates as 
follows: 

• a coherent block read request is issued for a load miss to a 
location within the page, or 

• a coherent block read request that requests exclusivity is issued 
for a store miss to a location within the page. 

In most systems, coherent read requests require snoops or directory 
checks, and noncoherent read requests do not.^ Cache lines within the 
page are managed with a write invalidate protocol; that is, the processor 
issues an invalidate request on a store hit to a shared cache line. 

Update 

Lines with an update coherency attribute must be in a multiprocessor 
environment and can reside in more than one cache at a time. When the 
coherency attribute is update, the processor issues a coherent block read 
request for a load or store miss to a location within the page. Cache lines 
within the page are managed with a write update protocol; that is, the 
processor issues an update request on a store hit to a shared cache line. 



t A coherent read that requests exclusivity implies that the processor functions most 
efficiently if the requested cache line is returned to it in an exclusive state, but the 
processor stUl performs correctly if the cache line is returned in a shared state. 
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Exclusive 

Lines with an exclusive coherency attribute must be in a multiprocessor 
environment. When the coherency attribute is exclusive, the processor 
issues a coherent block read request that requests exclusivity for a load or 
store miss to a location within the page. 

Cache lines within the page are managed with a write invalidate protocol. 

NOTE: Load Linked-Store Conditional instruction sequences must 
ensure that the link location is not in a page managed with the 
exclusive coherency attribute. 



Cache Operation Modes 

The R4000 processor supports the following two cache modes: 

• secondary-cache mode (R4000MC and R4000SC models; for 
R4000MC all five cache coherency attributes described above 
are applicable, and for R4000SC only uncached and 
noncoherent coherency attributes are applicable) 

• no-secondary-cache mode (only uncached and noncoherent 
coherency attributes are applicable). 

Secondary-Cache Mode 

In its secondary-cache mode, an R4000MC model provides a set of cache 
states and mechanisms that implement a variety of cache coherency 
protocols. In particiilar, the processor simultaneously supports both the 
write-invalidate and write-update protocols. 

No-Secondary-Cache Mode 

A processor in no-secondary-cache mode supports the uncached and 
noncoherent coherency attributes. These two attributes are described in 
the section titled Cache Coherency Attributes in this chapter. 
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Strong Ordering 

Cache-coherent muUiprocessor systems must obey ordering constraints 
on stores to shared data. A multiprocessor system that exhibits the same 
behavior as a uniprocessor system in a multiprogramming environment is 
said to be strongly ordered. 

An Example of Strong Ordering 

Given that locations X and Y have no particular relationship — that is, 
they are not in the same cache line — an example of strong ordering is as 
follows: 

1. At time T, Processor A performs a store to location X and at the same 
time processor B performs a store to location Y. 

2. At time T+1, Processor A does a load from location Y and at the same 
time processor B does a load from location X. 

For the system to be considered strongly ordered, either processor A must 
load the new value of Y, or processor B must load the new value of X, or 
both processors A and B must load the new values of Y and X, respectively, 
under all conditions. 

If processors A and B load old values of Y and X, respectively, imder any 
conditions, the system is not strongly ordered. 

Testing for Strong Ordering 

Table 11-5 shows the algorithm for testing strong ordering. 



Table 11-5 Algorithm for Testing Strong Ordering 



Time 


Processor A 


Processor B 


T 


Store to location X 


Store to location Y 


T+1 


Load from location Y 


Load from location X 



For this algorithm to succeed, stores must have a global ordering in time; 
that is, every processor in the system must agree that either the store to 
location X precedes the store to location Y, or vice versa. If this global 
ordering is enforced, the test algorithm for strong ordering succeeds. 
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Restarting the Processor 

Strong ordering requires precise control of a processor restart. 
Specifically, after completion of a processor coherency request, the system 
must ensure the completion of any cache state changes before allowing a 
processor restart. 

The following sections describe processor restarts in a strong-ordered 
system after a processor coherency request. 

Restart after a Coherent Read Request 

Unless a processor invalidate or update request is imacknowledged after 
a coherent read request, the processor restarts (if sequential ordering is 
enabled) after the last word in the block has been transmitted to the 

processor. 

Any external requests that must be completed before the read request is 
finished must be issued to the processor before the read response is issued. 

Restart after a Coherent Write Request 

The processor restarts after the coherent write request is completed. That 
is, the processor restarts after the last doubleword of data associated with 
the write request has been transmitted to the external agent, unless a 
processor read request is pending,^ or a processor invalidate or update 
request is imacknowledged. 

Restart after an Invalidate or Update Request 

Following an invalidate or update request, the processor restarts after the 
external agent asserts IvdAck* or IvdErr*, unless a processor read request 
is pending or the processor is processing an external request when either 
IvdAck* or IvdErr* is asserted. 

If either IvdAck* or IvdErr* is asserted during or after the first cycle that 
the external agent asserts ExtRqst*, the processor accepts the external 
request and completes any cache state changes associated with the 
external request before restarting. 



t That is, present but not yet executed. 
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If either IvdAck* or IvdErr* is asserted before, but not asserted during or 
after the first cycle that the external agent asserts ExtRqst*, the processor 
restarts before beginning the external request. 

External requests must be completed before a processor invalidate or 
update completes. They can be completed, provided the processor 
receives an asserted ExtRqst* by the external agent either before or diiring 
the same cycle IvdAck* or IvdErr* is asserted. 

11.9 Maintaining Coherency on Loads and Stores 

Cache coherency protocols maintain data consistency throughout a 
multiprocessor environment. Table 11-6 lists the coherency effects of load 

and store operations on primary and secondary cache states in a 
multiprocessor environment (using an R4000MC processor). 



Table 11-6 R4000MC Data Cache Coherency States 



Primary 
Cache States 


Secondary 
Cache States 


Action on 
Load 


Action on 
Store 


Invalid 


Any 


Miss 


Miss 


Shared 


Shared 
Dirty Shared 


None 


Read secondary tag. If the coherency 
algorithm is Update on Write, then 
send update and set the secondary 
cache state to Dirty Shared. If the 
coherency algorithm is Invalidate on 
Write, then send invalidate and set the 
primary and secondary cache states to 
Dirty Exclusive. 


Dirty Exclusive 


None 


Set the primary cache state to Dirty 
Exclusive. 


Clean Exclusive 


Clean 
Exclusive 


None 


Set the primary and secondary cache 
states to Dirty Exclusive. 


Dirty Exclusive 


None 


Set the primary data cache state to 
Dirty Exclusive. 


Dirty Exclusive^ 


Dirty Exclusive 


None 


None 



t The dirty exclusive primary state allows the primary cache to be written without a 
secondary access. 
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11.10 Manipulation of the Cache by an External Agent 

Just as the processor accesses caches, so too can an external agent examine 
and manipulate the state and content of the primary and secondary caches 
through invalidate, update, snoop, and intervention transactions. 

These transactions are described in the following sections. Encodings of 
these request transactions are given in Chapter 12. 



Invalidate 

An invalidate request causes the processor to change the state of the 
specified cache line to invalid in both the primary and secondary caches. 



Update 

An update request causes the processor to write the specified data element 
into the specified cache line, and either change the state of the cache line 
to shared in both the primary and secondary caches, or leave the state of 
the cache line unchanged, depending on the nature of the update request. 
An external agent can issue updates to cache lines that are in either the 
exclusive or shared states without changing the state of the cache line (see 
the SysCmdO) bit description in Chapter 12). 

NOTE: If there is an update to a line in the primary instruction cache, 
the line in the secondary cache is updated and the primary instruction 
cache line is invalidated. 



Snoop 

A snoop request to the processor causes the processor to return the 
secondary cache state of the specified cache line. 

At the same time, the processor atomically^ sets the state of the specified 
cache line in both the primary and secondary caches according to the value 
of the SysCmd(2:0) bits, which define cache state change, and are supplied 
by the external agent. 



t An atomic operation is one that cannot be split, or portions of it deferred. In this case, the 
processor sets the state of both secondary and primary caches in an indivisible action; it 
cannot set the state of one cache line, allow another process to interrupt, and then 
complete the first process by setting the state of the remaining cache Une. 
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Intervention 

An intervention request causes the processor to return the secondary 
cache state of the specified cache line and, under certain conditions related 
to the state of the cache line and the nature of the intervention request, the 
contents of the specified secondary cache line. 

At the same time, the processor atomically sets the state of the specified 
cache line in both the primary and secondary caches according to the value 
of the SysCmd(2:0) bits which define cache state change, and are supplied 
by an external agent. 

11.11 Coherency Conflicts 

The R4000MC processor must handle competing coherency conflicts that 
arise from the processor and an external source. This section describes 
how coherency conflicts arise and how they are handled. A system model 
illustrates the implications of coherency conflicts in a multiprocessor 
environment; a coherent read request cycle is described at the end of this 
section. 

Figure 11-11 shows the R4000MC processor issuing processor coherency 
requests and accepting external coherency requests. 



R4000MC 



processor coherency request 



coherent read 

invalidate 

update 



External Aoent 

• invalidate 

• update 

external coherency request • snoop 

• intervention 



Figure 11-11 Coherency Requests: Processor and External 
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The R4000MC processor issues the following processor coherency 
requests: 

• processor coherent read requests 

• processor invalidate requests 

• processor update requests 

The R4000MC processor accepts the following external coherency 
requests: 

• external invalidate requests 

• external update requests 

• external snoop requests 

• external intervention requests 

How Coherency Conflicts Arise 

Because of the overlapped nature of the system interface, it is possible for 
an external coherency request to target the same cache physical address as 
a pending processor read request, an unacknowledged processor 
invalidate, or an update request. The processor does not contain the 
comparison mechanism necessary to detect such conflicts; instead, it uses 
the secondary cache as a point of reference to determine suitable 
coherency actions, and only checks the state of the secondary cache at 
specific times. 

Processor Coherent Read Requests 

When the processor wants to service either a store or load cache miss for a 
page that has a coherent page attribute in the TLB (meaning the data 
passed back and forth should follow a defined multiprocessor coherency 
scheme), a coherent read request is used. 

Conflicting external coherency requests cannot affect the behavior of the 
processor for pending processor coherent read requests. The processor 
only issues read requests for a range of physical addresses not currently in 
the cache; consequently, an external coherency request that targets the 
same physical address range will not find this physical address range in. 
the cache. In such a case, the processor simply discards any external 
coherency requests that conflict with a pending processor coherent read 
request. 
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Processor Invalidate or Update Requests 

For processor invalidate or compulsory update requests, a cancellation 
mechanism indicates a conflict. For example, if an external coherency 
request is submitted while a processor invalidate or compulsory update 
request has been issued but not yet acknowledged, the conflict is resolved 
when the external agent cancels the processor invalidate or compulsory 
update. 

Cancellation is accomplished by setting the cancellation bit in the 
command for the coherency request [SysCmd(4)]. The processor, upon 
receiving an external coherency request with the cancellation bit set, 
considers its invalidate or update request to be acknowledged and 
cancelled. The processor again accesses the secondary cache to determine 
whether to reissue the invalidate or update request, or to issue a read 
request. 

An external agent can only assert the cancellation bit during an 
unacknowledged processor invalidate or unacknowledged compulsory 
update request. If an external coherency request is issued with the 
cancellation bit set, and there is no imacknowledged processor invalidate 
or update request pending, the behavior of the processor is undefined. 

If an external coherency request is issued with the cancellation bit set 
when a processor update request remains potential — in other words, 
while a processor read request is currently pending — the behavior of the 
processor is undefined. 

Processor potential update requests cannot be cancelled. Potential 

updates are always issued with processor read requests and become 
compulsory only after the response to the processor read request is 
returned in one of the shared states. 
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External Coherency Requests 

If an external agent issues an external coherency request that conflicts with 
an unacknowledged processor invalidate or update request, without 
setting the cancellation bit, the system will operate in an undefined 
manner. In this case, the processor has no indication of the conflict and 
does not reevaluate the cache state to determine the correct action; it 
simply waits for an acknowledge to its invalidate or update request as it 
would for any invalidate or update request. 

It is not possible for external coherency requests to conflict with processor 
write requests, since the processor does not accept external requests while 
a processor write request is in progress. 

Tables 11-7 and 11-8 summarize the interactions between processor 
coherency requests and conflicting external coherency requests, organized 
by processor state. These two tables show the processor in one of the 
following states: 

Idle: no processor transactions are pending. 

Read Pending: a processor coherent read request has been issued, 

but the read response has not been received. 

Potential Update Unacknowledged: a processor update request 
has been issued while a processor coherent read request is pend- 
ing but not yet acknowledged. By definition, therefore, the re- 
sponse to the coherent read request has not been received. 

Invalidate or Update Unacknowledged: a processor invalidate or 
update request has been issued but has not yet been acknowl- 
edged. By definition, no coherent read request is pending. 
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Table 11-7 Summary of Coherency Conflicts: Invalidate and Update 



Processor 
State 


Conflicting External Coherency Request 


Invalidate 


Invalidate 
with Cancel 


Update 


Update 
with Cancel 


Idle 


NA 


Undefined 


NA 


Undefined 


Read Pending 


OK 


Undefined 


OK 


Undefined 


Potential Update Unacknowledged 


OK 


Undefined 


OK 


Undefined 


Invalidate or Update 
Unacknowledged 


OK+ 


OK 


OK+ 


OK 



t This can cause incorrect system operation and normally should not be allowed to occur 



Table 11-8 Summary of Coherency Conflicts: Intervention and Snoop 



Processor 
State 


Conflicting External Coherency Request 


Intervention 


Intervention 
with Cancel 


Snoop 


Snoop 
with Cancel 


Idle 


NA 


Undefined 


NA 


Undefined 


Read Pending 


OK 


Undefined 


OK 


Undefined 


Potential Update 
Unacknowledged 


OK 


Undefined 


OK 


Undefined 


Invalidate or Update 
Unacknowledged 


OK+ 


OK 


OK+ 


OK 



+ This can cause incorrect system operation and normally should not be allowed to occur 



System Implications of Coherency Conflicts 

The constraints that the processor must place on the handling of coherency 
conflicts have certain implications on the design of a multiprocessor 
system using the R4000MC model. These constraints and their 
implications are described in this section. 
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System Model 

To describe the implications of a coherency conflict, this section uses a 
system model that is snooping, split-read, and bus-based; I/O is not 
considered in this model. 

The system model used in this example has the following components: 

• Four processor subsystems, each consisting of an R4000MC 

processor, a secondary cache, and an external agent (shown in 
Figure 11-12). The external agent communicates with the 
R4000MC processor, accepting processor requests and issuing 
external requests. Likewise, the system bus issues and receives 
bus requests. 

• A memory subsystem that communicates with main memory 
and the system bus. 

• A system bus that has the following characteristics: 

- It is a multiple master, request-based, arbitrated bus. 
When an agent wishes to perform a transaction on the 
bus, it must request the bus and wait for global 
arbitration logic to assert a grant signal before assuming 
mastership of the bus. Once mastership has been 
granted, the agent can begin a transaction. 

- It supports read transactions, read exclusive transactions, 
write transactions, and invalidate transactions. 

- It is a split-read bus. This means bus operations can 
separate a read request from the return of its data. 

- It is a snooping bus. All agents connected to the bus 
must monitor all bus traffic to correctly maintain cache 
coherency. 

• All of the TLB pages in the system have either a noncoherent or 
a sharable coherency attribute. (Noncoherent data is not 
allowed; noncoherent page attributes are used for instructions 
only.) 

• The sharable coherency attribute allows data to be shared 
between the four caches in the system by using a write 
invalidate cache coherency protocol. 

• The secondary cache states used are invalid, shared, clean 
exclusive, and dirty exclusive; the dirty shared secondary 
cache state is not allowed. 
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Subsystem 4 




Figure 11-12 4-Processor System Illustrating Coherency Transactions 

Given this system model, the following operations are described: 

• loads and stores 

• processor coherent read request and read response 

• processor invalidate 

• processor write 
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Load 

A shown in Figure 11-12, when a processor misses in the primary and 
secondary caches on a load, the processor issues a read request. The 
subsystem external agent translates this to a read request on the bus. The 
returned data is loaded in either the clean exclusive or shared state, based 
on the shared indication returned with the read response data.^ 

Store 

In this system model, when a processor misses in the primary and 
secondary caches on a store, it issues a read request with exclusivity; this 
is translated to a read exclusive on the bus and data is loaded in the dirty 
exclusive state. 

When a processor hits in the cache on a store to shared data, it issues an 
invalidate request that must be forwarded to the system bus. Before the 
store can be completed and the state changed to dirty exclusive, the 
invalidate request must be acknowledged. 

Processor Coherent Read Request and Read Response 

In this system model, when one of the external agents observes a coherent 
read request on the system bus, it does not take immediate action. Instead, 
the external agent issues an intervention request to its processor during 
the read response. This is referred to as a response complete read protocol; 
that is, the read is complete after the read response has occurred. 

At the end of the read response, each of the external agents in the system 
model indicate whether it was able to obtain the state of the cache line that 
is the target of the intervention; if successful, the external agent indicates 
either sharing or takeover. Takeover occurs when an external agent 
discovers that its processor has a dirty exclusive copy of the cache line that 
is the target of the read. 

The read response is extended until all external agents have obtained the 
state of the cache line from their processors. 

In this system model, the response from an external agent at the end of a 
read response depends on whether the read request was an ordinary read 
request or a read exclusive request. These are described in the following 
sections. 



t The shared indication is the result of an intervention request to another processor, and is 
supplied by an external agent that is a part of the other three processor subsystems. 
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Ordinary Read Request 

For an ordinary read request, an external agent indicates shared at the end 
of the read response if it finds that its processor has a copy of the requested 
cache line in the clean exclusive or shared state. 

An external agent indicates both shared and takeover at the end of a read 
response if it finds that its processor has a copy of the requested cache line 
in the dirty exclusive state. Having indicated takeover, the external agent 
supplies the contents of the cache line (returned by the processor in 
response to the intervention request) over the bus to the read requester, 
and causes the processor to change the state of the cache line to shared. At 
the same time the cache line is supplied to the read requester, it is also 
written back to memory. 

Read Exclusive Request 

For a read exclusive request, an external agent never indicates shared at 
the end of the read response, regardless of the state the cache line is in. 
Instead, the cache line must be in one of the following states: 

• If the current state of the cache line is clean exclusive or shared, 
the external agent changes the state of the cache line to invalid. 

• If the current state of the cache line is dirty exclusive, the 
external agent indicates takeover but not shared. Having 
indicated takeover, the external agent supplies the contents of 
the cache line to the read requester, and the processor changes 
the state of the cache line to invalid. While the cache line is 
supplied to the read requester, it is also written back to 
memory. 

Processor Invalidate 

In this system model, an invalidate request is considered complete as soon 
as it appears on the system bus. When an external agent observes an 
invalidate request on the system bus, it reacts as if the invalidate has 
changed the state of all caches at that instant. 

Processor Write 

In this system model, an external agent takes no action in response to a 
write request on the bus. 
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Handling Coherency Conflicts 

Coherency conflicts are examined and resolved based on the current state 
of the processor. Referring to Figure 11-12, the following conflicts and 
their resolutions are described in this section: 

• coherent read conflicts 

• coherent write conflicts 

• invalidate conflicts 

Coherent Read Conflicts 

External coherency requests that conflict with pending processor coherent 
read requests can be issued to the processor without affecting processor 
behavior. In the system model shown in Figure 11-12, no conflict 
detection is performed by the external agent for processor coherent read 
requests; if an external intervention request or invalidate request is 
forwarded to the processor that is in conflict with a pending processor 
coherent read request, it does not affect the processor cache since the 
targeted cache line is, by definition, absent from the cache. The processor 
effectively discards the conflicting external intervention request, 
responding with an invalid indication for the targeted cache line. 
Similarly, the processor discards a conflicting external invalidate request 
since the targeted cache line is not present and therefore invalid. 

For pending processor coherent read requests, conflict detection could be 
added to a system similar to the one shown in Figure 11-12. In such a case, 
when the external agent sees a read response on the bus that conflicts with 
a pending processor coherent read request, the external agent does not 
issue an intervention request to the processor. Rather, it simply reacts as 
if an intervention request has been completed and the cache line is not 
present in the processor cache. 

Similarly, when an external agent sees an invalidate request on the bus 
that conflicts with a pending processor coherent read request, it does not 
forward the invalidate request to the processor since the targeted cache 
line is absent from the processor cache. This scheme for conflict detection 
on processor coherent read requests could reduce the number of external 
intervention and invalidate requests issued to the processor. However, 
since the intervention and invalidate requests that would otherwise be 
issued to the processor cannot result in any state modification within the 
processor (since the targeted cache line is not present in the cache), conflict 
detection for processor coherent read requests is not necessary. 
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Coherent Write Conflicts 

As soon as a write request has been issued to the external agent, the 
external agent becomes responsible for the cache line. No conflicts are 
possible with a processor write request; however, the external agent must 
manage ownership of the cache line while it is waiting to acquire 
mastership of the system bus so that it can forward the write request. The 
external agent is responsible for the cache line from the time the issue cycle 
of the write request completes imtil the write request is forwarded to the 
system bus. 

If the response to a coherent read request conflicts with a waiting 
processor write request, or with a processor write request that is 
transmitting data, the external agent detects the conflict and does not issue 
an intervention request to the processor. Instead, it reacts as if an 
intervention request has been completed and the line is in the dirty 
exclusive state. The external agent indicates takeover and supplies the 
read data to the read requester itself without disturbing the processor. 
After providing the read data to the read requester, the external agent 
must discard the write request if the read request was a read exclusive. In 
fact, the external agent can ignore the write request for either type of read, 
since processor-supplied read data is also written back to memory. 

It is not possible for an invalidate request, or a write request that conflicts 
with a waiting processor write request, to appear on the system bus; 
before a processor write request can be issued, the state of the processor 
cache line must be set to dirty exclusive. 
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Invalidate Conflicts 

From the time the processor issues an invalidate request until that request 
is acknowledged, any external coherency request issued to the processor 
that conflicts with the unacknowledged invalidate must include a 
cancellation. 

In the model system shown in Figure 11-12, an acknowledge for the 
invalidate is sent to the processor as soon as the invalidate is forwarded to 
the system bus. Therefore, while the external agent is waiting to become 
a bus master to forward the invalidate request, the external agent must 
detect, by using comparators, any external coherency request that conflicts 
with the unacknowledged invalidate. If a conflict is detected, the external 
agent must not forward the invalidate request to the system bus; instead, 
it must rescind the invalidate request and submit the conflicting external 
request to the processor, with a cancellation for the invalidate request. 

If the response to a coherent read request conflicts with a waiting 
unacknowledged processor invalidate request, the external agent detects 
this conflict and does not forward the processor invalidate request to the 
bus. Instead, it discards the processor invalidate request and issues to the 
processor an intervention request that includes a cancellation. The 
processor then reevaluates its cache state and either reissues the invalidate 
request or issues a coherent read request. 

If an invalidate request appears on the bus while the external agent has a 
processor invalidate request waiting, and the external agent detects the 
conflict, the external agent does not forward the processor invalidate 
request. Instead, it discards the processor invalidate request and issues an 
external invalidate request that includes a cancellation to the processor. 
The processor then reevaluates its cache state and either reissues the 
invalidate request or issues a coherent read request. 

It is not possible for a write request that conflicts with a waiting processor 
invalidate request to appear on the system bus. To issue an invalidate 
request, the state of the cache line must be shared with every cache in the 
system that contains the line. 
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Sample Cycle: Coherent Read Request 

This section describes a multiprocessor system within which a coherent 
read request cycle^ services a secondary cache load miss. The system has 
two processors, and Pg, and two external agents linked to these 
processors, external agent A (E^) and external agent B (Eg). The external 
agents connect the processors to a system bus. Each of the processors has 
its own secondary cache. 

The sample cycle follows the steps below (these steps are also nimibered 
in Figures 11-13, 11-14, and 11-15): 

1. Processor B has a load miss within a sharable page. 

2. Processor B issues a coherent read request (CRR) through Eg. 

3. The CRR is placed on the bus. 
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Figure 11-13 Cache Load Miss Cycle: Coherent Read Request 



t Request Cycles are described in Chapter 12. 



MIPS R4000 Microprocessor User's Manual 



283 



Chapter 11 



System Bus 



Memory | 

T 



External 
Agent A (E^^) 



5 

External 
L Intervention 
Request (EIR) 



Processor 
A (Pa) 



DE 







Exte 
Agent 

i 


rnal 

B(Eb) 






Proce 
B(F 


ssor 


1 













Secondary 
Cache A (Sa) 



Secondary 
Cache B (Sb) 



Figure 11-14 Cache Load Miss Cycle: External Intervention 

4. As shown in Figiire 11-14, external agent E^^ reads the CRR from the 
bus. 

5. To service this CRR, issues an external intervention request (EIR) 
to processor A, P^. 

6. Pa receives the EIR and examines its secondary cache, S^- 

7. Depending on the type of intervention request — ^based on the state of 
the SysCmd(3) bit — one of the following actions is taken: 

• If the cache line in S^^ is in the dirty exclusive state, the entire 
cache line is returned. 

• Otherwise, P^^ just returns the state of the secondary cache line. 

In Figure 11-14 the retrieved data is in the dirty exclusive state (DE), 
servicing a load miss, when the state of cache line goes from dirty 
exclusive to dirty shared (DS),^ indicating P^ is owner of the line. 



t Assuming DS mode is enabled. 
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Figure 11-15 Cache Load Miss Cycle: Read Response 



8. Figure 11-15 shows the cache state and cache data retiimed from P/^, 
through Ea to the bus. 

9. This cache state and data are returned to Eg. 

10. Eg issues a read response to Pg 

11. Pa remains owner of the cache line. 
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11.12 R4000 Processor Synchronization Support 

In a multiprocessor system, it is essential that two or more processors 
working on a common task execute without corrupting each other's 
subtasks. S5mchronization, an operation that guarantees an orderly access 
to shared memory, must be implemented for a properly fimctioning 
multiprocessor system. Two of the more widely used methods are 
discussed in this section: test-and-set, and counter. 

Test-and-Set (Spinlock) 

Test-and-set^ uses a variable called the semaphore, which protects data 
from being simultaneously modified by more than one processor. 

In other words, a processor can lock out other processors from accessing 
shared data when the processor is in a critical section, a part of program in 
which no more than a fixed number of processors is allowed to execute. In 
the case of test-and-set, only one processor can enter the critical section. 

Figure 11-16 illustrates a test-and-set synchronization procedure that uses 
a semaphore; when the semaphore is set to 0, the shared data is unlocked, 
and when the semaphore is set to 1, the shared data is locked. 



I 
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t Test-and-set is sometunes referred to as spinlock. 
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Figure 11-16 Synchronization with Test-and-Set 

The processor begins by loading the semaphore and checking to see if it is 
unlocked (set to 0) in steps 1 and 2. If the semaphore is not 0, the processor 
loops back to step 1. If the semaphore is 0, indicating the shared data is 
not locked, the processor next tries to lock out any other access to the 
shared data (step 3). If not successful, the processor loops back to step 1, 
and reloads the semaphore. 

If the processor is successful at setting the semaphore (step 4), it executes 
the critical section of code (step 5) and gains access to the shared data, 
completes its task, imlocks the semaphore (step 6), and continues 
processing. 
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Counter 



Another common s5mchronization technique uses a counter. A counter is a 
designated memory location that can be incremented or decremented. 

In the test-and-set method, only one processor at a time is permitted to 
enter the critical section. Using a counter, up to N processors are allowed 
to concurrently execute the critical section. All processors after the Nth 
processor must wait until one of the N processors exits the critical section 
and a space becomes available. 

The counter works by not allowing more than one processor to modify it 
at any given time. Conceptually, the counter can be viewed as a variable 
that counts the number of limited resources (for example, the number of 
processes, or software licenses, etc.). Figure 11-17 shows this process. 
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Figure 11-17 Synchronization Using a Counter 
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LL and SC 

MIPS instructions Load Linked (LL) and Store Conditional (SC) provide 
support for processor synchronization. These two instructions work very 
much like their simpler counterparts, load and store. The LL instruction, 
in addition to doing a simple load, has the side effect of setting a bit called 
the link hit. This link bit forms a breakable link between the LL instruction 
and the subsequent SC instruction. The SC performs a simple store if the 
link bit is set when the store executes. If the link bit is not set, then the store 
fails to execute. The success or failure of the SC is indicated in the target 
register of the store. 

The link is broken in the following circumstances:^ 

• if any external request (invalidate, snoop, or intervention) 
changes the state of the line containing the lock variable to 
invalid 

• upon completion of an ERET (return from exception) 
instruction 

• an external update to the cache line containing the lock 
variable 

The most important features of LL and SC are: 

• They provide a mechanism for generating all of the common 

S3mchronization primitives including test-and-set, counters, 
sequencers, etc., with no additional overhead. 

• When they operate, bus traffic is generated only if the state of 
the cache line changes; lock words stay in the cache until some 
other processor takes ownership of that cache line. 



t The most obvious case where the link is broken occurs when an invalidate to the cache line 
is the subject of the load. In this case, some other processor has successfully completed a 
store to that line. 
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Examples Using LL and SC 



Figure 11-18 shows how to implement test-and-set using LL and SC 
instructions. 
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Figure 11-18 Test-and-Set using LL and SC 
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Figiire 11-19 shows synchronization using a counter. 






Yes 






Try decrementing 


counter 






Continue processing 



Loopi: LLr2,(r1) 



BLEZ r2,Loop1 
NOP 



SUB r3,r2,1 
SCr3,(r1) 



BEQ r3,0,Loop1 
NOP 



Loop2: LLr2,(r1) 



ADDr3,r2,1 
SC r3,(r1) 



BEQ r3,0,Loop2 
NOP 



Figure 11-19 Counter Using LL and SC 
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The System interface allows the processor to access external resources 
needed to satisfy cache misses and uncached operations, while permitting 
an external agent access to some of the processor internal resources. 

In the R4000MC configuration, the System interface also provides the 

processor with mechanisms to maintain the cache coherency of shared 
data, while providing an external agent the mechanisms to maintain 
system-wide multiprocessor cache coherency. 

This chapter describes the System interface from the point of view of both 
the processor and the external agent. 
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12.1 Terminology 

The following terms are used in this chapter: 

• An external agent is any logic device connected to the processor, 
over the System interface, that allows the processor to issue 
requests. 

• A system event is an event that occurs within the processor and 
requires access to external system resources. 

• Sequence refers to the precise series of requests that a processor 

generates to service a system event. 

• Protocol refers to the cycle-by-cycle signal transitions that occur 
on the System interface pins to assert a processor or external 
request. 

• Syntax refers to the precise definition of bit patterns on 
encoded buses, such as the command bus. 

12.2 System Interface Description 

The R4000 processor supports a 64-bit address/ data interface that can 
construct systems ranging from a simple uniprocessor with main memory 
to a multiprocessor system with caches and complete cache coherency. 
The System interface consists of: 

• 64-bit address and data bus, SysAD 

• 8-bit SysAD check bus, SysADC 

• 9-bit command bus, SysCmd 

• eight handshake signals: 

- RdRdy*, WrRdy* 

- ExtRqst*, Release* 

- Validin*, ValidOut* 

- IvdAck*, IvdErr* 

The processor uses the System interface to access external resources such 
as cache misses and uncached operations. In the case of R4000MC, the 
System interface also supports multiprocessor cache coherency. 
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Interface Buses 

Figure 12-1 shows the primary communication paths for the System 
interface: a 64-bit address and data bus, SysAD(63:0), and a 9-bit 
command bus, SysCmd(8:0). These SysAD and the SysCmd buses are 
bidirectional; that is, they are driven by the processor to issue a processor 

request, and by the external agent to issue an external request (see 
Processor and External Requests, in this chapter, for more information). 

A request through the System interface consists of: 

• an address 

• a System interface command that specifies the precise nature of 
the request 

• a series of data elements if the request is for a write, read 
response, or update. 
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Figure 12-1 System Interface Buses 
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Address and Data Cycles 

Cycles in which the SysAD bus contains a valid address are called address 
cycles. Cycles in which the SysAD bus contains valid data are called data 
cycles. Validity is determined by the state of the Validin* and ValidOut* 
signals (described in Interface Buses, in this chapter). 

The SysCmd bus identifies the contents of the SysAD bus during any 
cycle in which it is valid. The most significant bit of the SysCmd bus is 
always used to indicate whether the current cycle is an address cycle or a 
data cycle. 

• During address cycles [SysCmd(8) = 0], the remainder of the 
SysCmd bus, SysCmd(7:0), contains a System interface command 
(the encoding of System interface commands is detailed in 
System Interface Commands and Data Identifiers, in this 
chapter). 

• During data cycles [SysCmd(8) = 1], the remainder of the 
SysCmd bus, SysCmd(7:0), contains a data identifier (the 
encoding of data identifiers is detailed in System Interface 
Commands and Data Identifiers, in this chapter). 



Issue Cycles 

There are two types of processor issue cycles: 

• processor read, invalidate, and update request issue cycles 

• processor write request issue cycles. 

The processor samples the signal RdRdy* to determine the issue cycle for 
a processor read, invalidate, or update request; the processor samples the 
signal WrRdy* to determine the issue cycle of a processor write request. 

As shown in Figure 12-2, RdRdy* must be asserted two cycles prior to the 
address cycle of the processor read/invalidate/update request to define 
the address cycle as the issue cycle. 



SCycle 
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Figure 12-2 State of RdRdy* Signal for Read, Invalidate, or Update Requests 
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As shown in Figure 12-3, WrRdy* must be asserted two cycles prior to the 
first address cycle of the processor write request to define the address 
cycle as the issue cycle. 



SCycle 
SCIock 

SysAD Bus | ) Addr ) ( 



WrRdy* 



Figure 12-3 State of WrRdy* Signal for Write Requests 

The processor repeats the address cycle for the request until the conditions 
for a valid issue cycle are met. After the issue cycle, if the processor 
request requires data to be sent, the data transmission begins. There is 
only one issue cycle for any processor request. 

The processor accepts external requests, even while attempting to issue a 
processor request, by releasing the System interface to slave state in 
response to an assertion of ExtRqst* by the external agent. 

Note that the rules governing the issue cycle of a processor request are 
strictly applied to determine the action the processor takes. The processor 
either: 

• completes the issuance of the processor request in its entirety 
before the external request is accepted, or 

• releases the System interface to slave state without completing 
the issuance of the processor request. 

In the latter case, the processor issues the processor request (provided the 
processor request is still necessary) after the external request is complete. 
The rules governing an issue cycle again apply to the processor request. 
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Handshake Signals 

The processor manages the flow of requests through the following eight 
control signals: 

• RdRdy*, WrRdy* are used by the external agent to indicate 
when it can accept a new read (RdRdy*) or write (WrRdy*) 
transaction. 

• ExtRqst*, Release* are used to transfer control of the SysAD 
and SysCmd buses. ExtRqst* is used by an external agent to 
indicate a need to control the interface. Release* is asserted by 
the processor when it transfers the mastership of the System 
interface to the external agent. 

• The R4000 processor uses ValidOut* and the external agent 
uses Validin* to indicate valid command/ data on the 
SysCmd /SysAD buses. 

• IvdAck*, IvdErr* are used in multiprocessor systems; they are 
asserted by the external agent to indicate the successful 
completion (IvdAck*) or the unsuccessful completion (IvdErr*) 
of a pending processor invalidate or update request.^ 
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t When using the R4000SC processor, IvdAck* and IvdErr* must be connected to Vcc. 
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12.3 System Interface Protocols 

Figure 12-4 shows the System interface operates from register to register. 
That is, processor outputs come directly from output registers and begin 
to change with the rising edge of SClock.^ 

Processor inputs are fed directly to input registers that latch these input 
signals with the rising edge of SClock. This allows the System interface to 
run at the highest possible clock frequency. 
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Figure 12-4 System Interface Register-to-Register Operation 

Master and Slave States 

When the R4000 processor is driving the SysAD and SysCmd buses, the 
System interface is in master state. When the external agent is driving the 
SysAD and SysCmd buses, the System interface is in slave state. 

In master state, the processor asserts the signal ValidOut* whenever the 
SysAD and SysCmd buses are valid. 

In slave state, the external agent asserts the signal Validin* whenever the 
SysAD and SysCmd buses are valid. 



t SClock is an internal clock used by the processor to sample data at the System interface 
and to clock data into the processor System interface output registers; see Chapter 10 for 
more details. 
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Moving from Master to Slave State 

The System interface remains in master state imless one of the following 
occurs: 

• The external agent requests and is granted the System interface 
(external arbitration). 

• The processor issues a read request or completes the issue of a 
cluster (uncompelled change to slave state). 

External Arbitration 

The System interface must be in slave state for the external agent to issue 
an external request through the System interface. The transition from 
master state to slave state is arbitrated by the processor using the System 
interface handshake signals ExtRqst* and Release*. This transition is 

described by the following procedure: 

1. An external agent signals that it wishes to issue an external request by 
asserting ExtRqst*. 

2. When the processor is ready to accept an external request, it releases 
the System interface from master to slave state by asserting Release* 
for one cycle. 

3. The System interface returns to master state as soon as the issue of the 
external request is complete. 

This process is described in External Arbitration Protocol, later in this 
chapter. 
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Uncompelled Change to Slave State 

An uncompelled change to slave state is the transition of the System 
interface from master state to slave state, initiated by the processor when 
a processor read request is pending. Release* is asserted automatically 
after a read request or cluster (see Clusters, later in this chapter, for a 
definition of a cluster). An uncompelled change to slave state occurs either 
during or some number of cycles after the issue cycle of a read request, or 
either during or some number of cycles after the last cycle of the last 
request in a cluster. 

The uncompelled release latency depends on the state of the cache, the 
presence or absence of a secondary cache, and the secondary cache 
parameters (see Release Latency, in this chapter). After an uncompelled 
change to slave state, the processor returns to master state at the end of the 
next external request. This can be a read response, or some other type of 
external request. 

An external agent must note that the processor has performed an 
uncompelled change to slave state and begin driving the SysAD bus along 
with the SysCmd bus. As long as the System interface is in slave state, the 
external agent can begin an external request without arbitrating for the 
System interface; that is, without asserting ExtRqst*. 

After the external request, the System interface returns to master state. 

Whenever a processor read request is pending, after the issue of a read 
request or after the issue of all of the requests in a cluster, the processor 
automatically switches the System interface to slave state, even though the 
external agent is not arbitrating to issue an external request. This 
transition to slave state allows the external agent to return read response 
data. 
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12.4 Processor and External Requests 

There are two broad categories of requests: processor requests and external 
requests. These two categories are described in this section. 

When a system event occurs, the processor issues either a single request or 
a series of requests — called processor requests — through the System 
interface, to access an external resource and service the event. For this to 
work, the processor System interface must be connected to an external 
agent that is compatible with the System interface protocol, and can 
coordinate access to system resources. 

An external agent requesting access to processor caches or to a processor 
status register generates an external request. This access request passes 
through the System interface. System events and request cycles are shown 
in Figure 12-5. 
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Figure 12-5 Requests and System Events 
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Rules for Processor Requests 

The following rules apply to processor requests. 

• After issuing a processor read request, either individually or as 
part of a cluster, the processor cannot issue a subsequent read 
request until it has received a read response. 

• After issuing a processor update request, or after a potential 
update request becomes compulsory, the processor cannot 
issue a subsequent request until it has received an 
acknowledge for the update request. 

• After the processor has issued a write request, the processor 
cannot issue a subsequent request until at least four cycles after 
the issue cycle of the write request. This means back-to-back 
write requests with a single data cycle are separated by two 
iinused system cycles, as shown in Figure 12-6. 
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Figure 12-6 Back-to-Back Write Cycle Timing 
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Processor Requests 

A processor request is a request or a series of requests, through the System 
interface, to access some external resource. As shown in Figure 12-7, 
processor requests include read, write, null write, invalidate, and update. 
This section also describes clusters. 
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Figure 12-7 Processor Requests 



Read request asks for a block, doubleword, partial doubleword, word, or 
partial word of data either from main memory or from another system 
resource. 

Write request provides a block, doubleword, partial doubleword, word, or 
partial word of data to be written either to main memory or to another 
system resource. 

Null write request indicates that an expected write has been cancelled as a 
result of an external request. 

Invalidate request specifies a line in every other cache in the system that 
must be marked invalid. 

Update request provides a block, doubleword, partial doubleword, word, 
or partial word of data that must be transferred to every other cache in the 
system. 

Table 12-1 lists the processor requests that each type of R4000 can issue. 



Table 12-1 Supported Processor Requests 



Request 


R4000PC 


R4000SC 


R4000MC 


Processor Read 


X 


X 


X 


Processor Write 


X 


X 


X 


Processor Null Write 




X 


X 


Processor Invalidate 






X 


Processor Update 






X 
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Processor requests are managed by the processor in two distinct modes: 
secondary-cache mode and no-secondary-cache mode (see Chapter 11 for a 
description of these two modes), which are programmable through the 
boot-time mode control interface described in Chapter 9. 

The permissible modes of operation are dependent on the following 
processor package configurations; if not programmed correctly, the 
behavior of the processor is undefined. 

• An R4000PC must be programmed to run in no-secondary- 
cache mode. 

• An R4000SC or R4000MC can be programmed to run in either 
secondary-cache or no-secondary-cache mode. 

In no-secondary-cache mode, the processor issues requests in a strict 
sequential fashion; that is, the processor is only allowed to have one 
request pending at any time. For example, the processor issues a read 
request and waits for a read response before issuing any subsequent 
requests. The processor submits a write request only if there are no read 
requests pending. 

The processor has the input signals RdRdy* and WrRdy* to allow an 
external agent to manage the flow of processor requests. RdRdy* controls 
the flow of processor read, invalidate, and update requests, while WrRdy* 
controls the flow of processor write requests. Processor null write requests 
must always be accepted and cannot be delayed by either RdRdy* or 
WrRdy*. The processor request cycle sequence is shown in Figure 12-8. 
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Processor Read Request 

When a processor issues a read request, the external agent must access the 
specified resource and return the requested data. (Processor read requests 
are described in this section; external read requests are described in 
External Requests, later on in this chapter.) 

A processor read request can be split from the external agent's return of 
the requested data; in other words, the external agent can initiate an 
imrelated external request before it returns the response data for a 
processor read. A processor read request is completed after the last word 
of response data has been received from the external agent. 

Note that the data identifier (see System Interface Commands and Data 
Identifiers, in this chapter) associated with the response data can signal 
that the returned data is erroneous, causing the processor to take a bus 
error. 

Processor read requests that have been issued, but for which data has not 
yet been returned, are said to be pending. A read remains pending until the 
requested read data is returned. 

In secondary-cache mode, the external agent must be capable of accepting 
a processor read request followed by a potential update request any time 
all three of the following conditions are met: 

• There is no processor read request pending. 

• There is no unacknowledged processor update request. 

• The signal RdRdy* has been asserted for two or more cycles. 

In no-secondary-cache mode, the external agent must be capable of 
accepting a processor read request any time the following two conditions 
are met: 

• There is no processor read request pending. 

• The signal RdRdy* has been asserted for two or more cycles. 
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Processor Write Request 

When a processor issues a write request, the specified resource is accessed 
and the data is written to it. (Processor write requests are described in this 
section; external write requests are described in External Requests, later on 
in this chapter.) 

A processor write request is complete after the last word of data has been 
transmitted to the external agent. 

In secondary-cache mode, the external agent must be capable of accepting 
a processor write request any time all three of the following conditions are 
met: 

• There is no processor read request pending. 

• There is no unacknowledged processor update request that is 
compulsory. 

• The signal WrRdy* has been asserted for two or more cycles. 

In no-secondary-cache mode, the external agent must be capable of 
accepting a processor write request any time the following two conditions 
are met: 

• No processor read request is pending. 

• The signal WrRdy* has been asserted for two or more cycles. 
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Processor Invalidate Request 

An invalidate request notifies all processors that the specified cache line 
must be marked invalid in all caches in the system. Invalidate requests can 
only be used in a multiprocessing system. 

When a processor issues an invalidate request, the specified resource is 
accessed and the line is marked invalid. (Processor invalidate requests are 
described in this section; external invalidate requests are described in 
External Requests, later on in this chapter.) 

A processor invalidate request requires a completion acknowledge by 
either the invalidate acknowledge signal IvdAck* or the invalidate error 
signal IvdErr*, unless the invalidate is canceled by the external agent. A 
processor invalidate request that has been submitted, but for which the 
processor has not yet received an acknowledge or a cancellation, is said to 
be unacknowledged. When the processor invalidate request fails (IvdErr* is 
asserted), the issuing processor takes a bus error on the store instruction 
that generated the failed request. Figure 12-10 shows a sample processor 
invalidate /update request cycle. 

Invalidate cancellation is signaled to the processor during external 
invalidate, update, snoop, and intervention requests; IvdErr* signals a 
processor invalidate request has failed. 

A completion acknowledge for processor invalidate requests is signaled 
through the System interface on dedicated pins, and this acknowledgment 
can occur in parallel with processor and external requests. 

State changes in the external system are not instantaneously reflected in 
the caches of every processor, which means an external agent can discover 
that a processor request for an invalidate cannot be completed. For 
example, a processor store can hit on a shared cache line and issue an 
invalidate to the external agent. However, before the external agent can 
transmit the invalidate to the rest of the system another invalidate for the 
same cache line can be received by the external agent. If this occurs, the 
processor cache does not reflect the current state of the system and the 
processor invalidate must not be transmitted to the system; instead, the 
external agent must cancel the processor vmacknowledged invalidate. 
Figure 12-9 shows this cancellation cycle. 
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Figure 12-9 Cancelling an Invalidate Request 

The steps shown in Figure 12-9 are described below: 

1. The processor issues an invalidate on a store hit to a shared line in 
its cache. 

2. An invalidate request, coming from the system bus, is received by 
the processor's external agent targeting the same cache line. 

3. The external invalidate invalidates the cache line, and the 
processor invalidate request is cancelled. 

4. The processor re-examines the state of the cache line and 
discovers the cache line which was target of the store is now 
invalid. The processor issues a processor read request to service 
the store miss. 
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Processor Update Request 

An update request notifies all processors that a specified cache line in all 
caches throughout a multiprocessor system must be replaced with 
modified data. An update request can only be used in a multiprocessing 
system. 

When a processor issues an update request, the specified resource is 
accessed and the line is updated. (Processor update requests are described 
in this section; external update requests are described in External 
Requests, later on in this chapter.) 

A processor update request requires a completion acknowledge by either 
the invalidate acknowledge signal IvdAck* or the invalidate error signal 
IvdErr* (shown in Figure 12-10), unless the update is canceled by the 
external agent. A processor update request that has been submitted, but 
for which the processor has not yet received an acknowledge or a 
cancellation, is said to be unacknowledged. When the processor update 
request fails (IvdErr* is asserted), the issuing processor takes a bus error 
on the store instruction that generated the failed request. Figure 12-10 
shows a sample processor invalidate/ update request cycle. 

System bus 

R4000 I External Agent External Agent R4000 

1 . Processor Update or ^ 

invalidate Request 2 3. External Update 

*' ^ or Invalidate 

Request 

4 

-< 

5. IvdAck* or IvdErr* 



Figure 12-10 Processor Update/Invalidate Acknowledge Cycle 

Update cancellation is signaled to the processor during external 
invalidate, update, snoop, and intervention requests; IvdErr* signals a 

processor update request has failed. 

Since a completion acknowledge for processor update requests is signaled 
through the System interface on dedicated pins, this acknowledgment can 
occur in parallel with processor and external requests. 
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Clusters 

A cluster consists of a single processor read request, followed by one or 
two additional processor requests that are issued while the initial read 
request is pending. 

The processor supports three t5^es of clusters: 

• a processor read request, followed by a write request 

• a processor read request, followed by potential update request 

• a processor read request, followed by a potential update 
request, followed by a write request. 

In secondary-cache mode, the processor issues individual requests (as in 
no-secondary-cache mode), or cluster requests. All requests in the cluster 
must be accepted before the response to the read request that began the 
cluster can be returned to the processor. 

Potential update requests within a cluster can be disabled through the 
boot-time mode control interface. 

Read With Write Forthcoming Request as Part of a Cluster 

The processor signals that it is issuing a cluster containing a processor 
write request by issuing a read-with-write-forthcoming request, instead of 
starting the cluster with an ordinary read request. The read-with-write- 
forthcoming request is identified by a bit in the command for processor 
read requests. 

The external agent must accept all requests that form the cluster before it 
can respond to the read request that began the cluster. The behavior of the 
processor is undefined if the external agent returns a response to a 
processor read request before accepting all of the requests of the cluster. 

Potential Update as Part of a Cluster 

Potential updates are identified by setting a bit in the processor update 
command. A processor potential update request is any update request 
that is issued while a processor read request is pending. 

Once the processor issues a read request, a potential update request 
follows, regardless of the state of RdRdy*. Potential update requests do 
not obey the RdRdy* flow control rules for issuance, but rather issue with 
a single address cycle regardless of the state of RdRdy*. 
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A processor potential update request remains potential until the read 
response to the pending processor read request which began the cluster is 
received by the external agent. 

• If the read response data is returned in one of the shared 

states — shared or dirty shared — the potential update becomes 
compulsory and is no longer potential. A compulsory update 
must receive an acknowledge either by the signal IvdAck* or 
IvdErr*. 

• If the read response data is returned in one of the exclusive 
states — clean exclusive or dirty exclusive — the potential update is 
nullified and the processor neither expects nor requires an 
acknowledge. 

Write Request as Part of a Cluster 

A write request that is part of a cluster obeys the WrRdy* tuning rules for 
issuing, as shown earlier in Figure 12-3. 

Null Write Request as Part of a Cluster 

Since the processor accepts external requests between the issue of a read- 
with-write-forthcoming request that begins a cluster and the issue of the 
write request that completes a cluster, it is possible for an external request 
to eliminate the need for the write request in the cluster. For example, if 
the external agent issued an external invalidate request that targeted the 
cache line the processor was attempting to write back, the state of the 
cache line would be changed to invalid and the write back for the cache 
line would no longer be needed. In this event, the processor issues a 
processor null write request after completing the external request to 
complete the cluster. 

Processor null write requests do not obey the WrRdy* flow control rules 
for issuance, rather they issue with a single address cycle regardless of the 
state of WrRdy*. Any external request that changes the state of a cache 
line from dirty exclusive or dirty shared to clean exclusive, shared, or 
invalid obviates the need for a write back of that cache line. 
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External Requests 

External requests include read, write, invalidate, update, snoop, 
intervention, and null requests, as shown in Figure 12-11. External 
invalidate, update, snoop and intervention requests, as a group, are 
referred to as external coherence requests. This section also includes a 
description of read response, a special case of an external request. 
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Figure 12-11 External Requests 



Read request asks for a word of data from the processor's internal resource. 

Write request provides a word of data to be written to the processor's 
internal resource. 

Invalidate request specifies a cache line, in the primary and secondary 
caches of the processor, that must be marked invalid. 

Update request provides a doubleword, partial doubleword, word, or 
partial word of data to be written to the processor's primary and 
secondary caches. 

Snoop request checks the processor's secondary cache to see if a valid copy 
of a particular cache line exists. If a valid copy exists, the processor returns 
the state of the cache line at the specified physical address in the secondary 
cache, and can modify the state of the cache line. 

Intervention request requires the processor to return the state of the 
secondary cache line at the specified physical address. Under certain 
conditions related to the state of the cache line and the nature of the 
intervention request, the contents of the primary and secondary cache line 
can be returned. The state of the line can also be modified by this request. 
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Null request requires no action by the processor; it provides a mechanism 
for the external agent to either return control of the secondary cache to the 
processor, or return the System interface to the master state without 
affecting the processor. 

Table 12-2 lists the external requests that each type of R4000 can receive 
(an X indicates the request is supported on that model). 



Table 12-2 Supported External Requests 



Request Type 


R4000PC 


R4000SC 


R4000MC 


External Read 


X 


X 


X 


External Write 


X 


X 


X 


External Null 
(System interface) 


X 


X 


X 


External Null 
(Secondary Cache) 




X 


X 


External Invalidate 






X 


External Update 






X 


External Snoop 






X 


External Intervention 






X 



The processor controls the flow of external requests through the 
arbitration signals ExtRqst* and Release*, as shown in Figure 12-12. The 
external agent must acquire mastership of the System interface before it is 
allowed to issue an external request; the external agent arbitrates for 
mastership of the System interface by asserting ExtRqst* and then waiting 
for the processor to assert Release* for one cycle. 
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Figure 12-12 
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Mastership of the System interface always returns to the processor after an 
external request is issued. The processor does not accept a subsequent 
external request until it has completed the current request. The processor 
accepts external requests between the issue of a processor read request, or 
a processor read request followed by a potential update request and the 
issue of a processor write request within a cluster. 

If there are no processor requests pending, the processor decides, based on 
its internal state, whether to accept the external request, or to issue a new 
processor request. The processor can issue a new processor request even if 
the external agent is requesting access to the System interface. 

The external agent asserts ExtRqst* indicating that it wishes to begin an 
external request. The external agent then waits for the processor to signal 
that it is ready to accept this request by asserting Release*. The processor 
signals that it is ready to accept an external request based on the criteria 
listed below. 

• The processor completes any processor request or processor 
request cluster that is in progress. 

• While waiting for the assertion of RdRdy* to issue a processor 
read request, the processor can accept an external request if the 
request is delivered to the processor one or more cycles before 
RdRdy* is asserted. 

• While waiting for the assertion of WrRdy* to issue a processor 
write request, the processor can accept an external request 
provided the request is delivered to the processor one or more 
cycles before WrRdy* is asserted. 

• If waiting for the response to a read request after the processor 
has made an uncompelled change to a slave state, the external 
agent can issue an external request before providing the read 
response data. 
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External Read Request 

In contrast to a processor read request, data is returned directly in 
response to an external read request; no other requests can be issued until 
the processor returns the requested data. An external read request is 
complete after the processor retiims the requested word of data. 

The data identifier (see System Interface Commands and Data Identifiers 

in this chapter) associated with the response data can signal that the 
returned data is erroneous, causing the processor to take a bus error. 

NOTE: The processor does not contain any resources that are 
readable by an external read request; in response to an external read 
request the processor returns undefined data and a data identifier 
with its Erroneous Data bit, SysCmd(5), set. 

External Write Request 

When an external agent issues a write request, the specified resource is 
accessed and the data is written to it. An external write request is complete 
after the word of data has been transmitted to the processor. 

The only processor resource available to an external write request is the 
Interrupt register. 

External Invalidate Request 

When an external agent issues an invalidate request, the specified resource 
is accessed and the line is marked invalid. An external invalidate request 
is considered to be complete after the request has been transmitted. 

External Update Request 

When an external agent issues an update request, the specified resource is 
accessed and the line is replaced. An external update request is 
considered complete after the request has been transmitted. 
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External Snoop Request 

An external snoop request makes the processor inspect the secondary 
cache to see if the cache contains a valid version of the specified cache line. 
If the valid cache line is present, the processor reports the cache line state 
and can modify this state. 

An external snoop request is complete after the processor returns the state 
of the specified cache line. 

External Intervention Request 

When an external agent issues an intervention request, the specified 
secondary cache line is inspected. Upon inspection, the cache line state is 
reported and /or modified. Under certain circumstances the specified 
cache line may also be retrieved. The external intervention request is 
complete after one of the following occurs: 

• the processor returns the state of the specified cache line 

• the processor returns the last word of data for the specified 
cache line. 

Read Response 

A read response returns data in response to a processor read request, as 
shown in Figure 12-13. While a read response is technically an external 
request, it has one characteristic that differentiates it from all other 
external requests — it does not perform System interface arbitration. For 
this reason, read responses are handled separately from all other external 
requests, and are simply called read responses. 
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Figure 12-13 Read Response 
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12.5 Handling Requests 

This section details the sequence, protocol, and syntax (See Terminology, in 
this chapter, for definitions of these terms) of both processor and external 
requests. The following system events are discussed: 

• load miss in secondary-cache mode and no-secondary-cache 
mode 

• store miss in secondary-cache mode and no-secondary-cache 
mode 

• store hit 

• uncached loads/stores 

• CACHE operations 

• load linked store conditional. 

Load Miss 

When a processor load misses in both the primary and secondary caches, 
before the processor can proceed it must obtain the cache line that contains 
the data element to be loaded from the external agent. 

If the new cache line replaces a current dirty exclusive or dirty shared 
cache line, the current cache line must be written back before the new line 
can be loaded in the primary and secondary caches. 

The processor examines the coherency attribute (cache coherency 
attributes are described in Chapter 11) in the TLB entry for the page that 
contains the requested cache line, and executes one of the following 
requests: 

• If the coherency attribute is exclusive, the processor issues a 
coherent read request that also requests exclusivity. 

• If the coherency attribute is sharable or update, the processor 

issues a coherent read request. 

• If the coherency attribute is noncoherent, the processor issues a 
noncoherent read request. 

Table 12-3 shows the actions taken on a load miss to primary and 
secondary caches. 



318 



MIPS R4000 Microprocessor User's Manual 



System Interface 



Table 12-3 Load Miss to Primary and Secondary Caches 



Page Attribute 
(Write-back policy) 


Processor 
Configuration 


State of Data Cache Line Being Replaced 


No-Secondary-Cache 
Mode 


Secondary-Cache Mode 


Clean/Invalid 


Dirty 


Clean/Invalid 


Dirty 


Noncoherent 


All R4000 
models 


NCR 


NCR/W 


NCR 


NCR-W 


Exclusive 
(read and write 
invalidate) 


R4000SC 

R4000MC 


N/A 


N/A 


Rex 


Rex-W 


Shareable 
(write invalidate) 


R4000MC 


N/A 


N/A 


R 


R-W 


Update 
(write update) 


R4000MC 


N/A 


N/A 


R 


R-W 



NCR Processor noncoherent block read request 

NCR/W Processor noncoherent block read request followed by processor block write 

request 

NCR-W Cluster. Processor noncoherent block read request with write forthcoming 

followed by processor block write request 

R Processor coherent block read request 

R-W Cluster. Processor coherent block read request with write forthcoming followed 

by processor block write request 

Rgx Processor coherent block read request with exclusivity 

Rgx-W Cluster. Processor coherent block read request with exclusivity and write 

forthcoming followed by processor block write request 
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Secondary-Cache Mode 

In secondary-cache mode, if the current cache line does not have to be 
written back and the coherency attribute for the page that contains the 
requested cache line is not exclusive, the processor issues a coherent block 
read request for the cache line that contains the data element to be loaded. 

If the current cache line needs to be written back and the coherency 
attribute for the requested cache line is sharable or update, the processor 
issues a cluster. The cluster consists of a coherent block read-with-write- 
forthcoming request for the cache line that contains the data element to be 
loaded, followed by a block write request for the current cache Une. 

If the current cache needs to be written back and the coherency attribute 
for the page containing the requested cache line is exclusive, the processor 
issues a cluster consisting of an exclusive read-with-write-forthcoming 
request, followed by a write request for the current cache line. 

Table 12-3 lists these actions. 

No-Secondary-Cache Mode 

In no-secondary-cache mode, if the cache line must be written back on a 
load miss, the read request is issued and completed before the write 
request is handled. The processor takes the following steps: 

1. The processor issues a noncoherent read request^ for the cache line 
that contains the data element to be loaded. 

2. The processor then waits for an external agent to provide the read 

response. 

If the current cache line must be written back, the processor issues a write 
request to save the dirty cache line in memory. 
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Store Miss 

When a processor store misses in both the primary and secondary caches, 
the processor must obtain, from the external agent, the cache line that 
contains the target location of the store. The processor examines the 
coherency attribute in the TLB entry for the page (TLB page coherency 
attributes are listed in Chapter 4) that contains the requested cache line to 
see if the cache line is being maintained with either a write invalidate or a 
write update cache coherency protocol. 

The processor then executes one of the following requests: 

• If the coherency attribute is either sharable or exclusive, a write 
invalidate protocol is in effect, and a coherent block read that 
requests exclusivity is issued. 

• If the coherency attribute is update, a write update protocol is in 
effect and a coherent block read request is issued. 

• If the coherency attribute is noncoherent, a noncoherent block 
read request is issued. 

Table 12-4 shows the actions taken on a store miss to primary and 
secondary caches. 
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Table 12-4 Store Miss to Primary and Secondary Caches 



Page Attribute 
(Write-back 
Policy) 


Processor 
Configuration 


State of Data Cache Line Being Replaced 


No-Secondary- 
Cache Mode 


Secondary-Cache Mode 


Clean/ 
Invalid 


Dirty 


Clean/ 
Invalid 


Dirty 


Noncoherent 


All R4000 
models 


NCR 


NCR/ 
W 


NCR 


NCR-W 


Exclusive 

(write 

invalidate) 


R4000SC 
R4000MC 


N/A 


N/A 


Rex 


Rex-W 


Shareable 

(write 

invalidate) 


R4000MC 


N/A 


N/A 


Rex 


Rex-W 


Update 
(write update) 


R4000MC 


N/A 


N/A 


Dis(l) EnP) 
R/U R-PU 


Dis(i) EnP) 
R-W/U R-PU-W 



NCR Processor noncoherent block read request 

NCR/W Processor noncoherent block read request followed by processor block 

write request 

NCR-W Cluster. Processor noncoherent block read request with write forthcoming 

followed by processor block write request 

R^x Processor coherent block read request with exclusivity 

R^x-W Cluster: Processor coherent block read request with exclusivity and write 

forthcoming followed by processor block write request 

R/U Processor coherent block read request followed by processor update 

request (if read response data is shared or dirty shared) 

R-PU Cluster. Processor coherent block read request followed by processor 

potential update request 

R-PU-W Cluster. Processor coherent block read request followed by processor 

potential update request, followed by processor block write request 

R-W/U Cluster. Processor coherent block read request with write forthcoming 

followed by processor block write request, followed by processor update 
request (if read response data is shared or dirty shared) 

Dis*^> Potential update disable [Modebit(20): PotUpdDis = 1] 

En(2) Potential update enable [Modebit(20): PotUpdDis = 0] 
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Secondary-Cache Mode 

In secondary-cache mode, if the new cache line replaces a current cache 
line that is in either the dirty exclusive or dirty shared state, the current 
cache line must be written back before the new line can be loaded in the 
primary and secondary caches. The processor requests issued are a 
function of the page attributes listed below. 

Noncoherent Page Attribute 

If the current cache line must be written back, and the coherency attribute 
for the requested cache line is noncoherent, the processor issues a cluster 
consisting of a noncoherent block read-with-write-forthcoming request 
for the cache line that contains the store target location, followed by a 
block write request for the current cache line. 

If the current cache line does not need to be written back and the 

coherency attribute for the page that contains the requested cache line is 
noncoherent, the processor issues a noncoherent block read request for the 
cache line that contains the store target location. 

Sharable or Exclusive Page Attribute 

If the current cache line must be written back and the coherency attribute 
for the page that contains the requested cache line is sharable or exclusive, 
the processor issues a cluster consisting of a coherent block read request 
with exclusivity and write forthcoming, followed by a processor block 
write request for the current cache line. 

If the current cache line does not need to be written back and the coherency 
attribute for the page that contains the requested cache line is sharable or 
exclusive, the processor issues a coherent block read request that also 
requests exclusivity. 

Update Page Attribute 

If the current cache line must be written back and the coherency attribute 
for the page that contains the requested cache line is update, and potential 
updates are enabled, the processor issues a cluster consisting of a coherent 
block read-with-write-forthcoming request, followed by a potential 
update request, followed by a write request for the current cache Une. 
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If the current cache line does not need to be written back, the coherency 
attribute for the page that contains the requested cache line is update, and 
potential updates are enabled, the processor issues a cluster consisting of 
a read request, followed by a potential update request. 

In an update protocol, the cache line requested by a processor coherent 
read request can be returned in a shared state; the processor then has to 
issue an update request before it can complete a store instruction. A 
potential update issued with a read request in a cluster allows the external 
agent to anticipate the read response on the system bus. If the read 
response is in a shared state, the required update is quickly transmitted to 
the rest of the system. This provides the processor with the acknowledge 
and allows the processor to complete the store instruction as rapidly as 
possible. 

Without the potential update request, the response data must be returned 
to the processor. If the line is returned in the shared or dirty shared state, 
the processor issues an update request, which must then be forwarded to 
the system bus before an acknowledge can be returned to the processor. 

Note that potential updates behave as if they have not yet been issued by 
the processor. Potential updates are not subject to cancellation, and do not 
require an acknowledge. When a potential update is nullified, the 
processor behaves as if no update request was ever issued; when a 
potential update becomes compulsory, the processor behaves as if it had 
issued an update request at that instant. 

Compulsory Update: If the processor issues a cluster that contains a 
potential update, and the response data for the read request is 
returned with an indication that it must be placed in the cache in either 
a shared or dirty shared state, the potential update then becomes 
compulsory. Once a potential update becomes compulsory, it is 
subject to cancellation, and the processor requires an acknowledge for 
the update request. The external agent must forward the update to the 
system, then signal the acknowledge to the processor when the update 
is complete. The processor will not complete the store until it has 
received an acknowledge for the update request. 
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Nullifying a Potential Update: If the processor issues a cluster that 
contains a potential update, and the response data for the read request 
is returned in either a clean exclusive or dirty exclusive state, the 
potential update is nullified. Once a potential update has been 
nullified, the external agent must discard the update. The processor 
does not wait for or expect an acknowledge to a potential update that 
has been nullified. It is not correct to assert either IvdAck* or IvdErr* 
in this situation. 

If the read response data is returned in either the clean exclusive or dirty 
exclusive state, the processor cannot issue an update request. It is 
assumed that the external agent will take the appropriate action to change 
the state of the cache line to invalid in other caches. 

An external request indicating processor update cancellation can be issued 
when a processor read is not pending or when compulsory update is 
imacknowledged. Processor state is undefined if a cancellation is signaled 
on an external coherence request to the processor when a processor read 
is pending, or there is no unacknowledged compulsory update. 

No-Secondary-Cache Mode 

The processor issues a read request for the cache line that contains the data 
element to be loaded, then awaits the external agent to provide read data 
in response to the read request. Then, if the current cache line must be 
written back, the processor issues a write request for the current cache line. 

In no-secondary-cache mode, if the new cache line replaces a current cache 
line whose Write back (W) bit is set, the current cache line moves to an 
internal write buffer before the new cache line is loaded in the primary 
cache. 
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Store Hit 

This section describes store hits in both secondary-cache and no- 
secondary-cache mode. 

Secondary-Cache Mode 

When the processor hits in the secondary cache, on a line that is marked 
either shared or dirty shared, the processor must issue an update or 
invalidate request and then wait to receive an acknowledge, before the 
store is complete. The processor checks the coherency attribute in the TLB 
for the page containing the cache line that is target of the store, to 
determine if the cache line is managed by either a write invalidate or write 
update cache coherency protocol. 

• If the coherency attribute is sharable or exclusive, a write 
invalidate protocol is in effect, and the processor issues an 
invalidate request. The processor cannot complete the store 
imtil the external agent signals an acknowledge for this 
invalidate request. 

• If the coherency attribute is update, a write update protocol is 

in effect, and the processor issues an update request. The 
processor cannot complete the store imtil the external agent 
signals an acknowledge for this update request. 

No-Secondary-Cache Mode 

In no-secondary-cache mode, all lines are set to the dirty exclusive state. 
This means store hits cause no bus transactions. 

Uncached Loads or Stores 

When the processor performs an uncached load, it issues a noncoherent 
doubleword, partial doubleword, word, or partial word read request. 
When the processor performs an imcached store, it issues a doubleword, 
partial doubleword, word, or partial word write request. 

External requests have a higher priority than uncached stores. When 
using the uncached store buffer on an R4400 processor, it is possible for the 
external agent to receive cached and uncached stores out of program 
order, as the example below illustrates. Figure 12-14 shows a cached and 
uncached store instruction sequence: 
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SW r2, (r3) 


# uncached store 


SW r4, (r5) 


# cached store 



Figure 12-14 R4400 Processor Cached and Uncached Store Sequence 



Referring to Figure 12-14, suppose an external intervention or snoop is 
issued to the R4400 processor while the uncached store is still in the store 
buffer (the iincached store data has not yet been stored off-chip). The 
cached store from Figure 12-14 has hit in the primary cache and is in the 
tag check (TC) stage of the pipeline (see Chapter 3 for a description of the 
pipeline stages). In this case, the external agent sees the state of the 
internal caches after the cached store but before the result of the uncached 
store is available off the chip. Figure 12-15 shows how a SYNC instruction 
can force the imcached store to occur before the cached store. 



SW r2, (r3) 


# uncached store 


SYNC 




SW r4, (r5) 


# cached store 



Figure 12-15 R4400 Processor Cached and Uncached Stores, Using SYNC 



CACHE Operations 

The processor provides a variety of CACHE operations to maintain the 
state and contents of the primary and secondary caches. During the 
execution of the CACHE operation instructions, the processor can issue 
either write requests or invalidate requests. 

Load Linked Store Conditional Operation 

Generally, the execution of a Load Linked Store Conditional instruction 
sequence is not visible at the System interface; that is, no special requests 
are generated due to the execution of this instruction sequence. 

There is, however, one situation in which the execution of a Load Linked 
Store Conditional instruction sequence is visible, as indicated by the link 
address retained bit during a processor read request, as programmed by the 
SysCmd(2) bit. This situation occurs when the data location targeted by a 
Load Linked Store Conditional instruction sequence maps to the same 
cache line to which the instruction area containing the Load Linked Store 
Conditional code sequence is mapped. In this case, immediately after 
executing the Load Linked instruction, the cache line that contains the link 
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location is replaced by the instruction line containing the code. The link 
address is kept in a register separate from the cache, and remains active as 
long as the link bit, set by the Load Linked instruction, is set. 

The link bit, which is set by the load linked instruction, is cleared by a 
change of cache state for the line containing the link address, or by a 
Return From Exception. 

In order for the Load Linked Store Conditional instruction sequence to 
work correctly, all coherency traffic targeting the link address must be 
visible to the processor, and the cache line containing the link location 
must remain in a shared state in every cache in the system. This 
guarantees that a Store Conditional executed by some other processor is 
visible to the processor as a coherence request, changing the state of the 
cache line containing the link location. 

To accomplish this, a read request issued by the processor, causing the 
cache line containing the link location to be replaced. In the mean time, 
the link address retained bit is set, indicating the link address is being 
retained. This informs the external agent that, although the processor has 
replaced this cache line, the processor must still see any coherence traffic 
that targets this cache line. 

Any snoop or intervention request that targets a cache line which is not 
present in the cache — ^but for which the snoop or intervention address 
matches the current link address while the link bit is set — returns an 
indication that the cache line is present in the cache in a shared state. This 
is consistent with the coherency model, since the processor never returns 
data, in response to an intervention request, for a cache line that is in the 
shared state. The shared response guarantees that the cache line 
containing the link location remains in a shared state in all other 
processor's caches, and therefore that any other processor attempting a 
store conditional to this link location must issue a coherence request in 
order to complete the store conditional. 

For more information, refer to Chapter 11, or see the specific Load Linked 
and Store Conditional instructions described in Appendix A. 
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12.6 Processor and External Request Protocols 

The following sections contain a cycle-by-cycle description of the bus 
arbitration protocols for each t3^e of processor and external request. 
Table 12-5 lists the abbreviations and definitions for each of the buses that 
are used in the timing diagrams that follow. 



Table 12-5 System Interface Requests 



Scope 


Abbreviation 


Meaning 


Global 


Unsd 


Unused 


bysAU bus 


Addr 


Physical address 


Data<n> 


Data element number n of a block of data 




Cmd 


An unspecified System interface command 




Read 


A processor or external read request command 




RwWF 


A processor read-with-write-forfhcoming request 

command 




Write 


A processor or external write request command 




Null 


A processor null request comm.and 




SINuU 


A System interface release external null request 

command 




SCNull 


A secondary cache release external null request 
command 


SysCmd bus 


Ivd 


A processor or external invalidate request 
command 




Upd 


A processor or external update request command 




Ivtn 


An external intervention request command 




Snoop 


An external snoop request command 




NData 


A noncoherent data identifier for a data element 
other than the last data element 




NEOD 


A noncoherent data identifier for the last data 

element 




CData 


A coherent data identifier for a data element other 
than the last data element 




CEOD 


A coherent data identifier for the last data element 
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Processor Request Protocols 

Processor request protocols described in this section include: 

• read 

• write 

• invalidate and update 

• null write 

• cluster 

NOTE: In the timing diagrams, the two closely spaced, wavy vertical 
lines (such as those shown in Figure 12-16) indicate one or more iden- 
tical cycles which are not illustrated due to space constraints. 




Figure 12-16 Symbol for Undocumented Cycles 



Processor Read Request Protocol 

The following sequence describes the protocol for a processor read request 
(the numbered steps below correspond to Figures 12-17 and 12-18). 

1. RdRdy* is asserted low, indicating the external agent is ready to 

accept a read request. 

2. With the System interface in master state, a processor read request is 
issued by driving a read command on the SysCmd bus and a read 
address on the SysAD bus. 

3. At the same time, the processor asserts ValidOut* for one cycle, 
indicating valid data is present on the SysCmd and the SysAD buses. 

NOTE: Only one processor read request can be pending at a time. 

4. The processor makes an imcompelled change to slave state either at 
the issue cycle of the read request, or sometime after the issue cycle of 
the read request by asserting the Release* signal for one cycle. 
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NOTE: The external agent must not assert the signal ExtRqst* for the 
purposes of returning a read response, but rather must wait for the un- 
compelled change to slave state. The signal ExtRqst* can be asserted 
before or during a read response to perform an external request other 
than a read response. 

5. The processor releases the SysCmd and the SysAD buses one SCycle 
after the assertion of Release*. 

6. The external agent drives the SysCmd and the SysAD buses within 
two cycles after the assertion of Release*. 

Once in slave state (starting at cycle 5 in Figure 12-17), the external agent 
can return the requested data through a read response. The read response 
can return the requested data or, if the requested data coiild not be 

successfully retrieved, an indication that the returned data is erroneous. If 
the returned data is erroneous, the processor takes a bus error exception. 

Figure 12-17 illustrates a processor read request, coupled with an 
uncompelled change to slave state, that occurs as the read request is 
issued. Figure 12-18 illustrates a processor read request, and the 
subsequent uncompelled change to slave state, that occurs sometime after 
the read request is issued. 

NOTE: Timings for the SysADC and SysCmdP buses are the same as 
those of the SysAD and SysCmd buses, respectively. 



-« Master >- Slave *- 



SCycle 
SCIock 
SysAD Bus 
SysCmd Bus 
ValidOut* 




Validin* | 

RdRdy* | 0 

WrRdy* | 

Release* | \^ / 

Figure 12-17 Processor Read Request Protocol 
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Master Slave >- 




Figure 12-18 Processor Read Request Protocol, Change to Slave State Delayed 

When the following three events occur — a read request is pending, 
ExtRqst* is asserted, and Release* is asserted for one cycle — it may be 
unclear if the assertion of Release* is in response to ExtRqst*, or 
represents an uncompelled change to slave state. The only situation in 
which the assertion of Release* cannot be considered an uncompelled 
change to slave state is if the following three conditions exist 
simultaneously: 

• the System interface is operating in secondary-cache mode 

• the read request was a read-with-write-forthcoming request 

• the expected write request has not been issued by the 

processor. 

If these three conditions exist, the processor cannot accept the read 
response; rather, it accepts the external request. The write request must be 
accepted by the external agent before the read response can be issued. 

In all other cases, the assertion of Release* indicates either an 
uncompelled change to slave state, or a response to the assertion of 
ExtRqst*, whereupon the processor accepts either a read response, or any 
other external request. If any external request other than a read response 
is issued, the processor performs another uncompelled change to slave 
I state, asserting Release*, after processing the external request. 
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Processor Write Request Protocol 

Processor write requests are issued using one of two protocols. 

• Doubleword, partial doubleword, word, or partial word writes 
use a word^ write request protocol. 

• Block writes use a block write request protocol. 

Processor doubleword write requests are issued with the System interface 
in master state, as described below in the steps below; Figure 12-19 shows 
a processor noncoherent single word write request cycle. 

1. A processor single word write request is issued by driving a write 
command on the SysCmd bus and a write address on the SysAD bus. 

2. The processor asserts ValidOut*. 

3. The processor drives a data identifier on the SysCmd bus and data on 
the SysAD bus. 

4. The data identifier associated with the data cycle must contain a last 
data cycle indication. At the end of the cycle, ValidOut* is deasserted. 

NOTE: Tunings for the SysADC and SysCmdP buses are the same as 
those of the SysAD and SysCmd buses, respectively. 



- Master - 




SysCmd Bus 

ValidOut* 

Validin* 

RdRdy* 

WrRdy* 

Release* 



") (Addr)(Datao") ( 









1 a/ 







a 



Figure 12-19 Processor Noncoherent Single Word Write Request Protocol 



t Called word to distinguish it from block request protocol. Data transferred can actually be 
doubleword, partial doubleword, word, or partial word. 



MIPS R4000 Microprocessor User's Manual 



333 



Chapter 12 



Processor block write requests are issued with the System interface in 
master state, as described below; a processor coherent block request for 
eight words of data is illustrated in Figures 12-20 and 12-21. 

1. The processor issues a write command on the SysCmd bus and a write 
address on the SysAD bus. 

2. The processor asserts ValidOut*. 

3. The processor drives a data identifier on the SysCmd bus and data on 
the SysAD bus. 

4. The processor asserts ValidOut* for a number of cycles sufficient to 
transmit the block of data. 

5. The data identifier associated with the last data cycle must contain a 
last data cycle indication. 

NOTE: As shown in Figure 12-21, however, the first data cycle does 
not have to immediately follow the address cycle. 

Figures 12-20 and 12-21 illustrate a processor coherent block request for 
eight words of data. 



- Master - 



SCycle 

SCIock 

SysAD Bus 

SysCmd Bus 

ValidOut* 

Validin* 

RdRdy* 

WrRdy* 

Release* 



10 11 



12 



I Addr ) 


(oataO 


(oatal) 


(Data2 


(DatasX 






/Xwrite^ 


|CData 


(cData) 


(cData 


^ceodX 



Figure 12-20 Processor Coherent Block Write Request Protocol 
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- Master - 



12 3 4 5 6 7 



9 10 11 12 




Figure 12-21 Processor Coherent Block Write Request Protocol (Delayed) 

Processor Invalidate and Update Request Protocol 

Processor invalidate request and update request protocols are the same as 
a coherent word write request, except for the following: 

• invalidate and update requests are controlled by RdRdy*, 
while the write request is controlled by WrRdy* 

• the single data cycle transfer is not used by an invalidate 

request 

Processor invalidate and update requests are acknowledged using the 
signals IvdAck* and IvdErr*. The external agent drives either IvdAck* or 
IvdErr* for one cycle to signal the completion of the current processor 
update or invalidate request; IvdAck* occurs in parallel with requests on 
the SysAD and SysCmd buses. 

IvdAck* or IvdErr* can be driven at any time after a processor update or 
invalidate request is issued, provided the update request is compulsory. 
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The processor pipeline stalls until one of the following occurs: 

• IvdAck* or IvdErr* is asserted by the external agent. Assertion 
of IvdAck* indicates a successful invalidation, and the 
processor continues. IvdErr* causes a bus error exception. 

• either an intervention, snoop, update, or invalidate request is 
sent by the external agent, with the Invalidate or Update 
Cancellation bit set, SysCmd(4) = 0, indicating the processor 
invalidate or update request was cancelled. 

If the processor update or invalidate request is cancelled, the instruction 
that caused the processor request is re-executed. If the external request is 
sent with SysCmd(4) = 1, indicating no cancellation, the processor, after 
responding to the external request, stalls again until one of the two 
conditions described above terminate the processor's invalidate or update 
request. 



Processor Null Write Request Protocol 

A processor null write request is issued with the System interface in 
master state; the request consists of a single address cycle. The processor 
drives a null conunand on the SysCmd bus and asserts ValidOut* for one 
cycle. The SysAD bus is unused during the address cycle associated with 
a null write request, and processor null write requests cannot be 
controlled with either RdRdy* or WrRdy* signals. Figure 12-22 illustrates 
a processor null write request. 



■ Master - 




SysCmd Bus 

ValidOut* 

Validin* 



RdRdy* 
WrRdy* 
Release* 



Figure 12-22 Processor Null Write Request Protocol 
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Processor Cluster Request Protocol 

In secondary-cache mode, the processor can issue two types of requests: 
individual and cluster. 

All of the requests that are part of a cluster must be accepted by the 
external agent before a response to the read request, that began the cluster, 
can be returned to the processor. A cluster consists of: 

• a processor read with write forthcoming request followed by a 
write request 

• a processor read request followed by a potential update request 

• a processor read with write forthcoming request followed by a 
potential update request, followed by a write request. 

Figure 12-23 illustrates a cluster consisting of a read with write 
forthcoming request, followed by a potential update request, followed by 
a coherent block write request for eight words of data (with minimum 
spacing between the requests that form the cluster), followed by an 
ujncompelled change to slave state at the earliest opportunity. 

NOTE: Timings for the SysADC and SysCmdP buses are the same as 
those of the SysAD and SysCmd buses, respectively. There may be 
unused cycles between the requests that form a cluster. 



- Master 



SCycle 

SCIock 

SysAD Bus 

SysCmd Bus 

ValidOut* 

Validin* 

RdRdy* 

WrRdy* 

Release* 
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Figure 12-23 Processor Cluster Request Protocol 
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Processor Request and Cluster Flow Control 

The external agent uses RdRdy* to control the flow of the following 

processes: 

• processor read request 

• processor invalidate request 

• processor update request 

• processor read request, followed by a potential update request 
within a cluster. 

Figures 12-24 through 12-27 illustrate this flow control, as described in the 
steps below. 

1. The processor samples the signal RdRdy* to determine if the external 
agent is capable of accepting a read, invalidate, update request, or a 
read request followed by a potential update request. 

2. The signal WrRdy* controls the flow of a processor write request. 

3. The processor does not complete the issue of a read, invalidate, update 
request, or a read request followed by a potential update request, until 
it issues an address cycle in response to the request for which the 
signal RdRdy* was asserted two cycles earlier. 

4. The processor does not complete the issue of a write request until it 
issues an address cycle in response to the write request for which the 
signal WrRdy* was asserted two cycles earlier. 

Figure 12-24 illustrates two processor write requests in which the issue of 
the second is delayed for the assertion of WrRdy*. 

Figure 12-25 illustrates a processor cluster in which the issue of the read 
and a potential update request are delayed for the assertion of RdRdy*. 

Figure 12-26 illustrates a processor cluster in which the issue of the write 
request is delayed for the assertion of WrRdy*. 

Figure 12-27 illustrates the issue of a processor write request delayed for 
the assertion of WrRdy* and the completion of an external invalidate 
request. 

NOTE: Timings for the SysADC and SysCmdP buses are the same as 
those of the SysAD and SysCmd buses, respectively. 
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WrRdy* | 3 

Release* | 

Figure 12-24 Two Processor Write Requests, Second Write Delayed for the Assertion o/WrRdy* 
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SCycle 
SCIock 
SysAD Bus 
SysCmd Bus 
ValidOut* 
Validin* 
RdRdy* 

WrRdy* | 

Release* | 

Figure 12-25 Processor Read Request within a Cluster Delayed for the Assertion o/RdRdy* 
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SCycle ||1|2|3|4|5|6|7|8|9|10|11|12 
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WrRdy* | 
Release* | 

Figure 12-26 Processor Write Request within a Cluster Delayed for the Assertion 0/ WrRdy* 
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Figure 12-27 Processor Write Request Delayed for the Assertion o/WrRdy* and the Completion 

of an External Invalidate Request 
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External Request Protocols 

External requests can only be issued with the System interface in slave 
state. An external agent asserts ExtRqst* to arbitrate (see External 
Arbitration Protocol, below) for the System interface, then waits for the 
processor to release the System interface to slave state by asserting 
Release* before the external agent issues an external request. If the System 
interface is already in slave state — that is, the processor has previously 
performed an uncompelled change to slave state — the external agent can 
begin an external request immediately. 

After issuing an external request, the external agent must return the 
System interface to master state. If the external agent does not have any 
additional external requests to perform, ExtRqst* must be deasserted two 
cycles after the cycle in which Release* was asserted. For a string of 
external requests, the ExtRqst* signal is asserted tmtil the last request 
cycle, whereupon it is deasserted two cycles after the cycle in which 
Release* was asserted. 

The processor continues to handle external requests as long as ExtRqst* is 
asserted; however, the processor cannot release the System interface to 
slave state for a subsequent external request until it has completed the 
current request. As long as ExtRqst* is asserted, the string of external 
requests is not interrupted by a processor request. 

This section describes the following external request protocols: 

• read 

• null 

• write 

• invalidate and update 

• intervention 

• snoop 

• read response 
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External Arbitration Protocol 

System interface arbitration uses the signals ExtRqst* and Release* as 
described above. Figure 12-28 is a timing diagram of the arbitration 
protocol, in which slave and master states are shown. 

The arbitration cycle consists of the following steps: 

1. The external agent asserts ExtRqst* when it wishes to submit an 
external request. 

2. The processor waits until it is ready to handle an external request, 
whereupon it asserts Release* for one cycle. 

3. The processor sets the SysAD and SysCmd buses to tri-state. 

4. The external agent must wait at least two cycles after the assertion of 
Release* before it drives the SysAD and SysCmd buses. 

5. The external agent deasserts ExtRqst* two cycles after the assertion of 
Release*, imless the external agent wishes to perform an additional 
external request. 

6. The external agent sets the SysAD and the SysCmd buses to tri-state 
at the completion of an external request. 

The processor can start issuing a processor request one cycle after the 
external agent sets the bus to tri-state. 

NOTE: Tunings for the SysADC and SysCmdP buses are the same as 
those of the SysAD and SysCmd buses, respectively. 
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Figure 12-28 Arbitration Protocol for External Requests 
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External Read Request Protocol 

External reads are requests for a word of data from a processor internal 
resource, such as a register. External read requests cannot be split; that is, 
no other request can occur between the external read request and its read 
response. 

Figure 12-29 shows a timing diagram of an external read request, which 
consists of the following steps: 

1. An external agent asserts ExtRqst* to arbitrate for the System 
interface. 

2. The processor releases the System interface to slave state by asserting 
Release* for one cycle and then deasserting Release*. 

3. After Release* is deasserted, the SysAD and SysCmd buses are set to 

a tri-state for one cycle. 

4. The external agent drives a read request command on the SysCmd 
bus and a read request address on the SysAD bus and asserts 
Validin* for one cycle. 

5. After the address and command are sent, the external agent releases 
the SysCmd and SysAD buses by setting them to tri-state and 
allowing the processor to drive them. The processor, having accessed 
the data that is the target of the read, returns this data to the external 
agent. The processor accomplishes this by driving a data identifier on 
the SysCmd bus, the response data on the SysAD bus, and asserting 
ValidOut* for one cycle. The data identifier indicates that this is last- 
data-cycle response data. 

6. The System interface is in master state. The processor continues 
driving the SysCmd and SysAD buses after the read response is 
returned. 

NOTE: Timings for the SysADC and SysCmdP buses are the same as 
those of the SysAD and SysCmd buses, respectively. 

External read requests are only allowed to read a word of data from the 
processor. The processor response to external read requests for any data 
element other than a word is undefined. 
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Figure 12-29 External Read Request, System Interface in Master State 

NOTE: The processor does not contain any resources that are 
readable by an external read request; in response to an external read 
request the processor returns undefined data and a data identifier 
with its Erroneous Data bit, SysCmd(5), set. 

External Null Request Protocol 

The processor supports two kinds of external null requests. 

• A secondary cache release external null request returns ownership 
of the secondary cache to the processor while the System 
interface remains in slave state, until another external null 
request returns it to master state. 

• A System interface release external null request returns the System 
interface to master state from slave state without otherwise 
affecting the processor. 
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Any time the processor releases the System interface to slave state to 
accept an external request, it also allows the external agent to use the 
secondary cache, in anticipation of a cache coherence request. When the 
external agent uses the Sys AD bus for a transfer unrelated to the processor 
(for example, a DMA transfer), this ownership of the secondary cache 
prevents the processor from satisfying subsequent primary cache misses. 
To satisfy such a primary cache miss, the external agent issues a secondary 
cache release external null request, returning ownership of the secondary 
cache to the processor. 

External null requests require no action from the processor other than to 
return the System interface to master state, or to regain ownership of the 
secondary cache. 

Figures 12-30 and 12-31 show timing diagrams of the two external null 
request cycles, which consist of the following steps: 

1. The external agent asserts ExtRqst* to arbitrate for the System 
interface. 

2. The processor releases the System interface to slave state by asserting 
Release*. 

3. The external agent drives a secondary cache release external null 
request command on the SysCmd bus, and asserts Validin* for one 
cycle to return the secondary cache interface ownership to the 
processor. 

4. The SysAD bus is unused (does not contain valid data) during the 
address cycle associated with an external null request. 

5. After the address cycle is issued, the null request is complete. 

For a secondary cache release external null request, the System interface 
remains in slave state. 

For a System interface release external null request, the external agent releases 
the SysCmd and SysAD buses, and expects the System interface to return 
to master state. 
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Figure 12-30 Secondary Cache Release External Null Request 
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Figure 12-31 System Interface Release External Null Request 
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External Write Request Protocol 

External write requests use a protocol identical to the processor single 
word write protocol except the Validin* signal is asserted instead of 
ValidOut*. Figure 12-32 shows a timing diagram of an external write 
request, which consists of the following steps: 

1. The external agent asserts ExtRqst* to arbitrate for the System 
interface. 

2. The processor releases the System interface to slave state by asserting 
Release*. 

3. The external agent drives a write command on the SysCmd bus, a 
write address on the SysAD bus, and asserts Validin*. 

4. The external agent drives a data identifier on the SysCmd bus, data on 
the SysAD bus, and asserts Validin*. 

5. The data identifier associated with the data cycle must contain a 
coherent or noncoherent last data cycle indication. 

6. After the data cycle is issued, the write request is complete and the 
external agent sets the SysCmd and SysAD buses to a tri-state, 
allowing the System interface to return to master state. Timings for 
the SysADC and SysCmdP buses are the same as those of the SysAD 
and SysCmd buses, respectively. 

External write requests are only allowed to write a word of data to the 
processor. Processor behavior in response to an external write request for 
any data element other than a word is undefined. 

Master »-|-« Slave I"* Master 
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Figure 12-32 External Write Request, with System Interface initially a Bus Master 
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External Invalidate and Update Request Protocols 

External invalidate and update request protocols are the same as the 
external write request protocol. The data element provided with an 
update or invalidate request can be a doubleword, partial doubleword, 
word, or partial word. The single data cycle transfer is not used (it does 
not contain valid data) for an invalidate request. 

Figure 12-33 illustrates an external invalidate request following an 
imcompelled change to slave state. 

NOTE: Tunings for the SysADC and SysCmdP buses are the same as 
those of the SysAD and SysCmd buses, respectively. 




Figure 12-33 External Invalidate Request following an Uncompelled Change to Slave State 
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External Intervention Request Protocol 

External intervention requests use a protocol similar to that of external 
read requests, except that a cache line size block of data can be returned 
along with an indication of the cache state for the cache line. The cache 
state indication depends upon the state of the cache line and the value of 
the data return bit in the intervention request command.^ 

The data return bit indicates either return on dirty or return on exclusive: 

• If the data return bit indicates return on dirty, and the cache line 
that is target of the intervention request is in the dirty exclusive 
or dirty shared state, the contents of the cache line are returned 
in response to the intervention request. 

• If the data return bit indicates return on exclusive, and the cache 
line that is the target of the intervention request is in the clean 
exclusive or dirty exclusive state, the contents of the cache line 
are returned in response to the intervention request. 

If neither of the two cases above are true, the response to the intervention 
request does not include the contents of the cache line, but simply indicates 
the state of the cache line that is the target of the intervention request. 

The case in which the processor returns a cache line state, but not cache 
line contents, is described in the following steps: 

1. The external agent asserts ExtRqst* to arbitrate for the System 
interface. 

2. The processor releases the System interface to slave state by asserting 
Release*. 

3. The external intervention request is driven onto the SysCmd bus and 
the address onto the SysAD bus. Validin* is asserted for one cycle. 

4. The processor drives a coherent data identifier that indicates the state 
of the cache line on the SysCmd bus and asserts ValidOut* for one 
cycle. 

5. The SysAD bus is not used during the data cycle. 

6. The data identifier indicates a response data cycle that contains a last 
data cycle indication. 



t If the cache line that is the target of the intervention request is not present in the cache — 
that is, the tag comparison for the cache line at the target cache address fails — the cache 
line that is the target of the intervention request is considered to be in the invalid state. 
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Figure 12-34 shows an external intervention request to a cache line found 
in the shared state, with the System interface initially in a master state. 
Figure 12-35 shows an external intervention request to a cache line found 
in the dirty exclusive state, with the System interface initially in a slave 
state. 

NOTE: Tunings for the SysADC and SysCmdP buses are the same as 
those of the SysAD and SysCmd buses, respectively. 
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Figure 12-34 External Intervention Request, Shared Line, System Interface in Master State 
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The case in which the processor returns cache line contents is described in 
the steps below. In this example, the system is already in slave state. 

1. The external intervention request is driven onto the SysCmd bus and 
the address onto the SysAD bus. Validin* is asserted for one cycle. 

2. The processor drives data on the SysAD bus and a data identifier on 
the SysCmd bus. The processor asserts ValidOut* for each data cycle. 

3. The data identifier associated with the last data cycle must contain a 
last data cycle indicator. 




Figure 12-35 External Intervention Request, Dirty Exclusive Line, System Interface in Slave State 



The processor returns the contents of a cache line, along with an indication 
of the cache state in which it was foimd, by issuing a sequence of data 
cycles sufficient to transmit the contents of the cache line, as shown in 
Figure 12-35. The data identifier transmitted with each data cycle 
indicates the cache state in which the cache line was found, together with 
an indication that this data is response data. The data identifier associated 
with the last data cycle contains a last data cycle indication. 

If the contents of a cache line are returned in response to an intervention 
request, they are returned in subblock order starting with the doubleword 
at the address supplied with the intervention request. Note, however, that 
if the intervention address targets the doubleword at the beginning of the 
block, subblock ordering is equivalent to sequential ordering. 
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External Snoop Request Protocol 

External snoop requests use a protocol identical to the external read 
request protocol, except that, instead of returning data, the processor 
responds with an indication of the current cache state for the targeted 
cache line. This protocol is described by the following steps: 

1. The external agent asserts ExtRqst* to arbitrate for the System 
interface. 

2. The processor releases the System interface to slave state by asserting 
Release*. 

3. The external snoop request is driven onto the SysCmd bus and the 
address onto the SysAD bus. Validin* is asserted for one cycle. 

4. The processor drives a coherent data identifier on the SysCmd bus 
and asserts ValidOut* for one cycle. 

5. The SysAD bus is unused during the snoop response. 

6. The processor continues driving the SysCmd and SysAD buses after 
the snoop response is returned, to move the System interface back to 

master state. 

Note that if the cache line that is the target of the snoop request is not 
present in the cache — that is, a tag comparison for the cache line at the 
target cache address fails — the cache line that is the target of the snoop 
request is considered to be in the invalid state. 

Figure 12-36 shows an external snoop request submitted with the System 
interface in the master state. Figure 12-37 shows an external snoop request 
submitted with the System interface in slave state. 

NOTE: Timings for the SysADC and SysCmdP buses are the same as 
those of the SysAD and SysCmd buses, respectively. 
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Figure 12-36 External Snoop Request, System Interface in Master State 
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Figure 12-37 External Snoop Request, System Interface in Slave State 
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Read Response Protocol 

An external agent must return data to the processor in response to a 
processor read request by using a read response protocol. A read response 
protocol consists of the following steps: 

1. The external agent waits for the processor to perform an uncompelled 
change to slave state. 

2. The processor returns the data through a single data cycle or a series 
of data cycles. 

3. After the last data cycle is issued, the read response is complete and 
the external agent sets the SysCmd and SysAD buses to a tri-state. 

4. The System interface returns to master state. 

NOTE: The processor always performs an imcompelled change to 
slave state after issuing a read request. 

5. The data identifier for data cycles must indicate the fact that this data 
is response data. 

6. The data identifier associated with the last data cycle must contain a 
last data cycle indication. 

For read responses to coherent block read requests, each data identifier 
must include the cache state of the response data. The cache state 
provided with each data identifier must be the same and must be clean 
exclusive, dirty exclusive, shared, or dirty shared. The behavior of the 
processor is undefined if the cache state provided with the data identifiers 
changes during the transfer of the block of data, or if the cache state 
provided is invalid. 

The data identifier associated with a data cycle can indicate that the data 
transmitted during that cycle is erroneous; however, an external agent 
must return a data block of the correct size regardless of the fact that the 
data may be in error. If a read response includes one or more erroneous 
data cycles, the processor then takes a bus error. 

Read response data must only be delivered to the processor when a 
processor read request is pending. The behavior of the processor is 
imdefined when a read response is presented to it and there is no 
processor read pending. Further, if the processor issues a read-with-write- 
forthcoming request, a processor write request or a processor null write 
request must be accepted before the read response can be returned. The 
behavior of the processor is undefined if the read response is returned 
before a processor write request is accepted. 



354 



MIPS R4000 Microprocessor User's Manual 



System Interface 



Figure 12-38 illustrates a processor word read request followed by a word 
read response. Figure 12-39 illustrates a read response for a processor 
block read with the System interface already in slave state. 

NOTE: Tunings for the SysADC and SysCmdP buses are the same as 
those of the SysAD and SysCmd buses, respectively. 



. Master - 



Slave 



Master 



1 2 3 4 5 6 7 8 9 10 11 12 




Figure 12-38 Processor Word Read Request, followed by a Word Read Response 
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Figure 12-39 Block Read Response, System Interface already in Slave State 
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12.7 Data Rate Control 

The System interface supports a maximum data rate of one doubleword 
per cycle. The data rate the processor can support is directly related to the 
secondary cache access time; if the access time is too long, the processor 
cannot transmit and accept data at the maximum rate. 

The rate at which data is delivered to the processor can be determined by 
the external agent — for example, the external agent can drive data and 
assert Validin* every n cycles, instead of every cycle. An external agent 
can deliver data at any rate it chooses, but must not deliver data to the 
processor any faster than the processor is capable of receiving it. 

The processor only accepts cycles as valid when Validin* is asserted and 
the SysCmd bus contains a data identifier; thereafter, the processor 
continues to accept data iintil it receives the data word tagged as the last 
one. 

Data Transfer Patterns 

A data pattern is a sequence of letters indicating the data and unused cycles 
that repeat to provide the appropriate data rate. For example, the data 
pattern DDxx specifies a repeatable data rate of two doublewords every 
four cycles, with the last two cycles unused. Table 12-6 lists the maximum 
processor data rate for each of the possible secondary cache write cycle 
times, and the most efficient data pattern for each data rate. 



Table 12-6 Transmit Data Rates and Patterns 



Maximum Data Rate 


Data Pattern 


Maximum Secondary 
Cache Access 


1 Double/ 1 SClock Cycle 


D 


4 PCycles 


2 Doubles/ 3 SClock Cycles 


DDx 


6 PCycles 


1 Double/2 SClock Cycles 


DDxx 


8 PCycles 


1 Double/2 SClock Cycles 


DxDx 


8 PCycles 


2 Doubles/5 SClock Cycles 


DDxxx 


10 PCycles 


1 Double/3 SClock Cycles 


DDxxxx 


12 PCycles 


1 Double/3 SClock Cycles 


DxxDxx 


12 PCycles 


1 Double/4 SClock Cycles 


DDxxxxxx 


16 PCycles 


1 Double/4 SClock Cycles 


DxxxDxxx 


16 PCycles 



356 



MIPS R4000 Microprocessor User's Manual 



System Interface 



In Tables 12-6 and 12-7, data patterns are specified using the letters D and 
x; D indicates a data cycle and x indicates an unused cycle. Figure 12-40 
shows a read response in which data is provided to the processor at a rate 
of two doublewords every three cycles using the data pattern DDx. 



SCycle 
SCIock 
SysAD Bus 
SysCmd Bus 
ValidOut* 
Validin* 
ExtRqst* 
Release* 



10 11 12 



XoataO 


(DatalX 


XData2 


)(Data3}- 




XcData 


<CDataX 


XcData 


KcEGoy 



Figure 12-40 Read Response, Reduced Data Rate, System Interface in Slave State 



Secondary Cache Transfers 

The processor operates most efficiently if data is delivered in pairs of 
doublewords, since the secondary cache is organized as a 128-bit RAM 
array. The most efficient way of reducing the data rate is to deliver a pair 
of doublewords followed by some number of unused cycles, followed by 
another pair of doublewords. The secondary cache write cycle time 
should determine the rate at which this pattern is repeated. However, the 
processor accepts data in any pattern as long as the time between the 
transfer of any pair of odd-numbered doublewords is greater than, or 
equal to, the write cycle time of the secondary cache. Doublewords in the 
transfer pattern are numbered beginning at 0: the odd-numbered 
doublewords are the second, fourth, sixth, and so on. 
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Secondary Cache Write Cycle Time 

Behavior of the processor is undefined if, based on the secondary cache 
write cycle time, data is delivered to the processor faster than the 
processor can handle it. Secondary cache write cycle time is defined as the 
sirai of the parameters: 

TwrlDly TwrSUp' ^^1^ T^rRc 
These parameters are defined in Chapter 9, Table 9-1. 

The rate at which the processor transmits data to an external agent is 
programmable at boot time through the boot-time mode control interface. 
The transmit data rate can be programmed to any of the data rates and 
data patterns listed in Table 12-6, as long as the programmed data rate 
does not exceed the maximimi rate the processor can handle, based on the 
secondary cache write cycle time. The behavior of the processor is 
imdefined if a programmed transmit data rate exceeds the maximvim the 
processor can support. 

Figure 12-41 shows a processor write request in which the processor 
transmit data rate is programmed as one doubleword every two cycles, 
using the data pattern DDxx. 
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Figure 12-41 Processor Write Request, Transmit Data Rate Reduced 
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Table 12-7 shows the maximum transmit data rates for a given set of 
secondary cache parameters, based on a PClock-to-SClock divisor of 2. To 
find the maximum allowable secondary cache write cycle time and 
secondary cache access time, multiply the maximum secondary cache 
numbers for each pattern by: 

(PClock_to_SClock_Divisor ) / 2 

The minimimi number for these parameters is always the minimum access 
time supported by processor. 



Table 12-7 Maximum Transmit Data Rates 



Secondary Cache 
Write Cycle Time 


Maximum Data Rate 


Best Data 
Pattern 


1-4 PCycles 


1 Double/1 SClock Cycle 


D 


5-6 PCycles 


2 Doubles/3 SClock Cycles 


DDx 


7-8 PCycles 


1 Double/2 SClock Cycles 


DDxx 


9-10 PCycles 


2 Doubles/5 SClock Cycles 


DDxxx 


11-12 PCycles 


1 Double/3 SClock Cycles 


DDxxxx 



Independent Transmissions on the SysAD Bus 

In most applications, the SysAD bus is a point-to-point connection, 
running from the processor to a bidirectional registered transceiver 
residing in an external agent. For these applications, the SysAD bus has 
only two possible drivers, the processor or the external agent. 

Certain applications may require connection of additional drivers and 
receivers to the SysAD bus, to allow transmissions over the SysAD bus 
that the processor is not involved in. These are called independent 
transmissions. To effect an independent transmission, the external agent 
must coordinate control of the SysAD bus by using arbitration handshake 
signals and external null requests. 



MIPS R4000 Microprocessor User's Manual 



359 



Chapter 12 



An independent transmission on the SysAD bus follows this procediire: 

1. The external agent requests mastership of the SysAD bus, to issue an 
external request. 

2. The processor releases the System interface to slave state. 

3. If the processor is being used with a secondary cache, the external 

agent issues a secondary cache release external null request to return 
ownership of the secondary cache to the processor. 

4. The external agent then allows the independent transmission to take 
place on the SysAD bus, making sure that Validin* is not asserted 
while the transmission is occurring. 

5. When the transmission is complete, the external agent must issue a 
System interface release external null request to return the System 
interface to master state. 

System Interface Endianness 

The endianness of the System interface is programmed at boot time 
through the boot-time mode control interface, and remains fixed imtil the 
next time the processor mode bits are read. Software cannot change the 
endianness of the System interface and the external system; software can 
set the reverse endian bit to reverse the interpretation of endianness inside 
the processor, but the endianness of the System interface remains 
unchanged. 
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12.8 System Interface Cycle Time 

The processor specifies minimum and maximum cycle counts for various 
processor transactions and for the processor response time to external 
requests. Processor requests themselves are constrained by the System 
interface request protocol, and request cycle counts can be determined by 
examining the protocol. The following System interface interactions can 
vary within minimum and maximum cycle counts: 

• spacing between requests within a cluster {cluster request 
spacing) 

• waiting period for the processor to release the System interface 
to slave state in response to an external request {release latency) 

• response time for an external request that requires a response 

{external response latency). 

The remainder of this section describes and tabulates the minimum and 
maximum cycle counts for these System interface interactions. 

Cluster Request Spacing 

Processor internal activity determines the minimum and maximum 
number of unused cycles allowed between the requests within a cluster. 

• The minimum number of unused cycles allowed between 
requests within a cluster is 0: in other words, the requests can 
be adjacent. 

• The maximum number of unused cycles separating requests 
within a cluster varies depending on the requests that form the 
cluster. 

Table 12-8 summarizes the minimum and maximum number of unused 
cycles allowed between requests within a cluster. 



Table 12-8 Unused Cycles Separating Requests within a Cluster 



From Processor 
Request 


To Processor 
Request 


Minimum Unused 
SClock Cycles 


Maximum Unused 
SClock Cycles 


Read 


Update 


0 


2 


Read 


Write 


0 


2 


Update 


Write 


0 


2 
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Release Latency 

Release latency is generally defined as the niimber of cycles the processor 
can wait to release the System interface to slave state for an external 
request. When no processor requests are in progress, internal activity — 
such as refilling the primary cache from the secondary cache — can cause 
the processor to wait some number of cycles before releasing the System 
interface. Release latency is therefore more specifically defined as the 
number of cycles that occur between the assertion of ExtRqst* and the 
assertion of Release*. 

There are three categories of release latency: 

• Category 1 : when the external request signal is asserted two 
cycles before the last cycle of a processor request, or two cycles 
before the last cycle of the last request in a cluster. 

• Category 2: when the external request signal is not asserted 
during a processor request or cluster, or is asserted during the 
last cycle of a processor request or cluster. 

• Category 3: when the processor makes an uncompelled change 
to slave state. 

Table 12-9 simraiarizes the minimimi and maximirai release latencies for 
requests that fall into categories 1, 2, 3a and 3b. Note that the maximum 
and minimirai cycle coimt values are subject to change. 



Table 12-9 Release Latency for External Requests 



Category 


Minimum PCycles 


Maximum PCycles 


1 


4 


6 


2 


4 


24 


3a 


0 


See (3a), below 


3b 


0 


See (3b), below 



^dis 

+ 4- or 8-word Secondary cache write cycle time 
(depending upon Primary cache size) 
] + 4-word Secondary cache write cycle time 
+ Secondary cache line size 
+ 16 PCycles 

J 4-word Secondary cache Write cycle time 
I +4 PCycles 



(3a) Read = 



(3b) Read 
With Write 
Forthcoming 
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External Request Response Latency 

The number of cycles the processor takes to respond to an external 
intervention request, read request, or snoop request, are referred to as the 
intervention response latency, external read response latency, or snoop response 
latency, respectively. 

The niimber of latency cycles is the nvunber of iinused cycles between the 
address cycle of the request and the first data cycle of the response. 
Intervention response latency and snoop response latency are a function 
of processor internal activity and secondary cache access time. Table 12- 
10 summarizes the minimum and maximum intervention response 
latency and snoop response latency. Note that the latency values are 
subject to change. 



Table 12-10 Intervention Response and Snoop Response Latencies 



Maximum Secondary 
Cache 
Access 


Intervention 
Response 
Latency 


Snoop Response 
Latency 


Min 


Max 


Min 


Max 


1-4 PCycles 


6 


26 


6 


26 


5-6 PCycles 


8 


28 


8 


28 


7-8 PCycles 


10 


30 


10 


30 


9-10 PCycles 


12 


32 


12 


32 


11-12 PCycles 


14 


34 


14 


34 



External read response latency is a function of processor internal activity. 
Minimum and maximum external read response latency is 4 PCycles. 
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12.9 System Interface Commands and Data Identifiers 

System interface commands specify the nature and attributes of any 
System interface request; this specification is made during the address 
cycle for the request. System interface data identifiers specify the 
attributes of data transmitted during a System interface data cycle. 

The following sections describe the syntax, that is, the bitwise encoding of 
System interface commands and data identifiers. 

Reserved bits and reserved fields in the command or data identifier 
should be set to 1 for System interface commands and data identifiers 
associated with external requests. For System interface commands and 
data identifiers associated with processor requests, reserved bits and 
reserved fields in the command and data identifier are undefined. 

Command and Data Identifier Syntax 

System interface commands and data identifiers are encoded in 9 bits and 
are transmitted on the SysCmd bus from the processor to an external 
agent, or from an external agent to the processor, during address and data 
cycles. Bit 8 (the most-significant bit) of the SysCmd bus determines 
whether the current content of the SysCmd bus is a command or a data 
identifier and, therefore, whether the current cycle is an address cycle or a 
data cycle. For System interface commands, SysCmd(8) must be set to 0. 
For System interface data identifiers, SysCmd(8) must be set to 1. 
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System Interface Command Syntax 

This section describes the SysCmd bus encoding for System interface 
commands. Figure 12-42 shows a common encoding used for all System 
interface commands. 



5 4 



Request Type 



Request Specific 



Figure 12-42 System Interface Command Syntax Bit Definition 

SysCmd(8) must be set to 0 for all System interface commands. 

SysCmd(7:5) specify the System interface request t5^e which maybe read, 
write, null, invalidate, update, intervention, or snoop; Table 12-11 lists the 
encoding of SysCmd(7:5). 

Table 12-11 shows the types of requests encoded by the SysCmd(7:5) bits. 
Table 12-11 Encoding of SysCmd(7:5) /or System Interface Commands 



SysCmd(7:5) 


Command 


0 


Read Request 


1 


Read- With- Write-Forthcoming Request 


2 


Write Request 


3 


Null Request 


4 


Invalidate Request 


5 


Update Request 


6 


Intervention Request 


7 


Snoop Request 



SysCmd(4:0) are specific to each type of request and are defined in each of 
the following sections. 
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Read Requests 

Figure 12-43 shows the format of a SysCmd read request. 

8 7 5 4 3 2 1 0 

Q Read Request Specific 

00*1 (see tables) 

Figure 12-43 Read Request SysCmd Bus Bit Definition 

Tables 12-12 through 12-14 list the encodings of SysCmd(4:0) for read 
requests. 

Table 12-12 Encoding o/ SysCmd(4:3) /or Read Requests 



SysCmd(4:3) 


Read Attributes 


0 


Coherent block read 


1 


Coherent block read, exclusivity requested 


2 


Noncoherent block read 


3 


Doubleword, partial doubleword, word, or partial word 


Table 12-13 Encoding o/ SysCmd(2:0) /or Coherent and Noncoherent 
Block Read Request 


SysCmd(2) 


Link Address Retained Indication 


0 


Link address not retained 


1 


Link address retained 


SysCmd(l:0) 


Read Block Size 


0 


4 words 


1 


8 words 


2 


16 words 


3 


32 words 
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Table 12-14 Doubleword, Word, or Partial-word Read Request Data Size 
Encoding o/SysCmd(2:0) 



SysCmd(2:0) 


Read Data Size 


0 


1 byte valid (Byte) 


1 


2 bytes valid (Halfword) 


2 


3 bytes valid (Tribylie) 


3 


4 bytes valid (Word) 


4 


5 bytes valid (Quintibyte) 


5 


6 bytes valid (Sextibyte) 


6 


7 bytes valid (Septibyte) 


7 


8 bytes valid (Doubleword) 



Write Requests 

Figure 12-44 shows the format of a SysCmd write request. 

Table 12-15 lists the write attributes encoded in bits SysCmd(4:3). Table 
12-16 lists the block write replacement attributes encoded in bits 
SysCmd(2:0). Table 12-17 lists the write request bit encodings in 
SysCmd(2:0). 



8 


7 




5 


4 3 2 1 0 


0 


010 


Write Request Specific 

(see tables) 

1 1 


1 



Figure 12-44 Write Request SysCmd Bus Bit Definition 
Table 12-15 Write Request Encoding o/SysCmd(4:3) 



SysCmd(4:3) 


Write Attributes 


0 


Reserved 


1 


Reserved 


2 


Block write 


3 


Doubleword, partial doubleword, word, or 
partial word 
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Table 12-16 Block Write Request Encoding o/SysCmd(2:0) 



SysCmd(2) 


Cache Line Replacement Attributes 


0 


Cache line replaced 


1 


Cache line retained^ 


SysCmd(l:0) 


Write Block Size 


0 


4 words 


1 


8 words 


2 


16 words 


3 


32 words 



tThe only time the processor sets this bit is if a Hit Writeback causes the processor 
to execute a write request (see Cache Write Policy in Chapter 11). 



Table 12-17 Doubleword,Word, or Partial-word Write Request Data Size 
Encoding o/SysCmd(2:0) 



SysCmd(2:0) 


Write Data Size 


0 


1 byte valid (Byte) 


1 


2 bytes valid (Halfword) 


2 


3 bytes valid (Tribyte) 


3 


4 bytes valid (Word) 


4 


5 bytes valid (Quintibyte) 


5 


6 bytes valid (Sextibyte) 


6 


7 bytes valid (Septibyte) 


7 


8 bytes valid (Doubleword) 
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Null Requests 

Figure 12-45 shows the format of a SysCmd null request. 



8 


7 




5 


4 3 2 1 0 


0 


011 


Null Request Specific 
(see tables) 

1 1 


1 



Figure 12-45 Null Request SysCmd Bus Bit Definition 



Processor null write requests. System interface release external null 
requests, and secondary cache release external null requests all use the 
null request command. Table 12-18 lists the encodings of SysCmd(4:3) for 
processor null write requests. Table 12-19 lists the encodings of 
SysCmd(4:3) for external null requests. 

SysCmd(2:0) are reserved for both instances of niill requests. 



Table 12-18 Processor Null Write Request Encoding of SysCmd(4:3) 



SysCmd(4:3) 


Null Write Attributes 


0 


Null write 


1 


Reserved 


2 


Reserved 


3 


Reserved 


Table 12-19 External Null Request Encoding o/SysCmd(4:3) 


SysCmd(4:3) 


Null Attributes 


0 


System Interface release 


1 


Secondary cache release 


2 


Reserved 


3 


Reserved 
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Invalidate Requests 



Figure 12-46 shows the format for an invalidate request, and Table 12-20 
lists the encodings of SysCmd(4:0) for an external invalidate request. 

SysCmd(4:0) are reserved on a processor invalidate request. 



100 



Invalidate' Request 
Specific 
(see table) 



Figure 12-46 Invalidate Request SysCmd Bus Bit Definition 
Table 12-20 Encoding of SysCmd(4:0) /or External Invalidate Requests 



SysCmd(4) 


Processor Unacknowledged Invalidate or Update 
Cancellation 


0 


Invalidate or Update cancelled 


1 


No cancellation 


SysCmd(3:0) 


Reserved 



Update Requests 



Figure 12-47 shows the format for a SysCmd update request. 

8 7 5 4 3 2 



101 



Update Request Specific 
(see tables) 



Figure 12-47 Update Request SysCmd Bus Bit Definition 

Table 12-21 lists the encodings of SysCmd(4:0) for external update 
requests. Table 12-22 lists the encodings of SysCmd(4:0) for processor 
update requests. The remaining upper bits are the same for both processor 
and external update requests. 
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Table 12-21 Encoding 0/ SysCmd(4:0) /or External Update Requests 



SysCmd(4) 


Processor Unacknowledged Invalidate or 


n 

\J 




1 

X 








n 

\J 




1 

J_ 


i\HJ Llldlliit, HJ Cddlt: Sldlt: 


SysCmd(2:0) 


Update Data Size 


0 


1 byte valid (Byte) 


1 


2 bytes valid (Halfword) 


2 


3 bytes valid (Tribyte) 


3 


4 bytes valid (Word) 


4 


5 bytes valid (Quintibyte) 


5 


6 bytes valid (Sextibyte) 


6 


7 bytes valid (Septibyte) 


7 


8 bytes \'alid (Doubleword) 


Tflfc/e 12-22 Encoding o/SysCmd(4:0) /or Processor Update Requests 


SysCmd(4) 


Reserved 


SysCmdO) 


Update type 


0 


Compulsory 


1 


Potential 


SysCmd(2:0) 


Update Data Size 


0 


1 byte valid (Byte) 


1 


2 bytes valid (Halfword) 


2 


3 bytes valid (Tribyte). 


3 


4 bytes valid (Word) 


4 


5 bytes valid (Quintibyte) 


5 


6 bytes valid (Sextibyte) 


6 


7 bytes valid (Septibyte) 


7 


8 bytes valid (Doubleword) 
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Intervention and Snoop Requests 

Figure 12-48 shows the format of an intervention request; Figure 12-49 
shows the format of a snoop request. Table 12-23 lists the encodings of 
SysCmd(4:0) for intervention requests; Table 12-24 lists the encodings 
SysCmd(4:0) for snoop requests. 



8 


7 




5 


4 3 2 0 


0 


110 


Intervention Request Specific 
(see table) 

1 1 


1 



Figure 12-48 Intervention Request SysCmd Bus Bit Definition 



Table 12-23 Encodings of SysCmd(4:0) /or Intervention Requests 



SysCmd(4) 


Processor Unacknowledged Invalidate or Update 
Cancellation 


0 


Update or Invalidate cancelled 


1 


No cancellation 


SysCmdO) 


Response to Dirty or Exclusive State 


0 


Return cache line data if in the dirty exclusive or dirty 
shared state 


1 


Return cache line data if in the clean exclusive or dirty 
exclusive state 


SysCmd(2:0) 


Cache State Change Function 


0 


No change to cache state 


1 


If cache state is clean exclusive, change to shared; 
otherwise no change to cache state 


2 


If cache state is clean exclusive or shared, change to 
invalid; otherwise no change to cache state 


3 


If cache state is clean exclusive, change to shared; if cache 
state is dirty exclusive, change to dirty shared; otherwise 
make no change to cache state 


4 


If cache state is clean exclusive, dirty exclusive, or dirty 
shared, change to shared; otherwise make no change to 
cache state 


5 


Change to invalid regardless of current cache state 


6 


Reserved 


7 


Reserved 
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8 


7 




5 


4 3 2 0 


0 


111 


Snoop Request Specific 
(see table) 


1 



Figure 12-49 Snoop Request SysCmd Bus Bit Definition 
Table 12-24 Encodings o/ SysCmd(4:0) /or Snoop Requests 



SysCmd(4) 


Processor Unacknowledged Update Cancellation 


0 


Update cancelled 


1 


No cancellation 


SysCmdO) 


Reserved 


SysCmd(2:0) 


Cache State Change Function 


0 


No change to cache state 


1 


If cache state is clean exclusive, change to shared state; 
otherwise make no change to cache state 


2 


If cache state is clean exclusive or shared, change to 
invalid state; otherwise make no change to cache state 


3 


If cache state is clean exclusive, change to shared; if 
cache state is dirty exclusive, change to dirty shared; 
otherwise make no change to cache state 


4 


If cache state is clean exclusive, dirty exclusive, or 
dirty shared, change to shared; otherwise make no 
change to cache state 


5 


Change to invalid regardless of current cache state 


6 


Reserved 


7 


Reserved 
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System Interface Data Identifier Syntax 

This section defines the encoding of the SysCmd bus for System interface 
data identifiers. Figure 12-50 shows a common encoding used for all 
System interface data identifiers. 



8 


7 


6 


5 


4 


3 


2 0 










See 






1 


Last 


Resp 


Err 


Note 


Reserved 


Cache 




Data 


Data 


Data 


below 




State 



Figure 12-50 Data Identifier SysCmd Bus Bit Definition 



SysCmd(8) must be set to 1 for all System interface data identifiers. 

NOTE: SysCmd(4) is reserved for processor data identifier. In an 
external data identifier, SysCmd(4) indicates whether or not to check 

the data and check bits for error. 

System interface data identifiers have two formats, one for coherent data 
and another for noncoherent data. 



Coherent Data 

Coherent data is defined as follows: 

• data that is returned in response to a processor coherent block 

read request 

• data that is returned in response to an external intervention 
request. 

Noncoherent Data 

Noncoherent data is defined as follows: 

• data that is associated with processor block write requests and 
processor doubleword, partial doubleword, word, or partial 
word write requests 

• data that is returned in response to a processor noncoherent 
block read request or a processor doubleword, partial 
doubleword, word, or partial word read request 

• data that is associated with external update requests 

• data that is associated with external write requests 

• data that is retiirned in response to an external read request 

• data that is associated with processor update requests. 
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Data Identifier Bit Definitions 

SysCmd(7) marks the last data element and SysCmd(6) indicates whether 
or not the data is response data, for both processor and external coherent 
and noncoherent data identifiers. Response data is data retiimed in 
response to a read request or an intervention request. 

SysCmd(5) indicates whether or not the data element is error free. 

Erroneous data contains an uncorrectable error and is returned to the 
processor, forcing a bus error. In the case of a block response, the entire 
line must be delivered to the processor no matter how minimal the error. 
The processor delivers data with the good data bit deasserted if a primary 
parity error is detected for a transmitted data item. If the system is in ECC 
mode, a secondary cache data ECC error is detected by comparing the 
values transmitted on the SysAD and SysADC. 

SysCmd(4) indicates to the processor whether to check the data and check 
bits for this data element, for both coherent and noncoherent external data 
identifiers. 

SysCmdO) is reserved for external data identifiers. 

SysCmd(4:3) are reserved for both coherent and noncoherent processor 

data identifiers. 

SysCmd(2:0) indicate the data cache state to load the cache line, in 
response to processor coherent read requests for coherent data identifiers. 
SysCmd(2:0) also indicate the cache state for response data to an external 
intervention request, or for the data cycle issued in response to an external 
snoop request. SysCmd(2:0) are reserved for noncoherent data identifiers. 

Table 12-25 lists the encodings of SysCmd(7:3) for processor data 
identifiers. Table 12-26 lists the encodings of SysCmd(7:3) for external 
data identifiers. Table 12-27 lists the encodings of SysCmd(2:0) for 
coherent data identifiers. 
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Table 12-25 Processor Data Identifier Encoding o/SysCmd(7:3) 



SysCmd(7) 


Last Data Element Indication 


0 


Last data element 


1 


Not the last data element 


SysCmd(6) 


Response Data Indication 


0 


Data is response data 


1 


Data is not response data 


SysCmd(5) 


Good Data Indication 


0 


Data is error free 


1 


Data is erroneous 


SysCmd(4:3) 


Reserved 


Table 12-26 External Data Identifier Encoding o/SysCmd(7:3) 


SysCmd(7) 


Last Data Element Indication 


0 


Last data element 


1 


Not the last data element 


SysCmd(6) 


Response Data Indication 


0 


Data is response data 


1 


Data is not response data 


SysCmd(5) 


Good Data Indication 


0 


Data is error free 


1 


Data is erroneous 


SysCmd(4) 


Data Checking Enable 


0 


Check the data and check bits 


1 


Do not check the data and check bits 


SysCmdO) 


Reserved 
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Table 12-27 Coherent Data Identifiers Encoding o/SysCmd(2:0) 



SysCmd(2:0) 


Cache State 


0 


Invalid^ 


1 


Reserved 


2 


Reserved 


3 


Reserved 


4 


Clean Exclusive 


5 


Dirty Exclusive 


6 


Shared 


7 


Dirty Shared 



tThis state also occurs if the line does not exist in the cache. 



12.10 System Interface Addresses 

System interface addresses are full 36-bit physical addresses presented on 
the least-significant 36 bits (bits 35 through 0) of the SysAD bus during 
address cycles; the remaining bits of the SysAD bus are unused during 
address cycles. 



Addressing Conventions 

Addresses associated with doubleword, partial doubleword, word, or 
partial word transactions and update requests, are aligned for the size of 
the data element. The system uses the following address conventions: 

• Addresses associated with block requests are aligned to 
double-word boimdaries; that is, the low-order 3 bits of 
address are 0. 

• Doubleword requests set the low-order 3 bits of address to 0. 

• Word requests set the low-order 2 bits of address to 0. 

• Halfword requests set the low-order bit of address to 0. 

• Byte, tribyte, quintibyte, sextibyte, and septibyte requests use 
the byte address. 
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Sequential and Subblock Ordering 

The order in which data is returned in response to a processor block read 
request can be programmed to sequential ordering or subblock ordering, 
using the boot-time mode control interface. Appendix C has more 
information about subblock ordering. Either sequential or subblock 
ordering may be enabled, as follows: 

• If sequential ordering is enabled on a block read request, the 
processor delivers the address of the doubleword at the start of 
the block. An external agent must return the block of data 
sequentially from the beginning of the block. 

• If subblock ordering is enabled, the processor delivers the 
address of the requested doubleword within the block. An 
external agent must return the block of data using subblock 
ordering, starting with the addressed doubleword. 

NOTE: Only R4000SC and R4000MC configurations (using a 
secondary cache) can be programmed to use sequential ordering. 

For block write requests, the processor always delivers the address of the 
doubleword at the beginning of the block; the processor delivers data 

beginning with the doubleword at the beginning of the block and 
progresses sequentially through the doublewords that form the block. 

During data cycles, the valid byte lines depend upon the position of the 
data with respect to the aligned doubleword (this may be a byte, halfword, 
tribyte, quadbyte/word, quintibyte, sextibyte, septibyte, or an octalbyte/ 
doubleword) . For example, in little-endian mode, on a byte request where 
the address modulo 8 is 0, SysAD(7:0) are valid during the data cycles. 

12.11 Processor Internal Address Map 

External reads and writes provide access to processor internal resources 
that may be of interest to an external agent. The processor decodes bits 
SysAD(6:4) of the address associated with an external read or write 
request to determine which processor internal resource is the target. 
However, the processor does not contain any resources that are readable 
through an external read request. Therefore, in response to an external 
read request the processor returns undefined data and a data identifier 
with its Erroneous Data bit, SysCmd(5), set. The Interrupt register is the 
only processor internal resource available for write access by an external 
request. The Interrupt register is accessed by an external write request 
with an address of OOO2 on bits 6:4 of the SysAD bus. 
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The R4000SC and R4000MC versions of the R4000 processor contain 
interface signals for an optional external secondary cache. This interface 
consists of: 

• a 128-bit data bus 

• a 25-bit tag bus 

• an 18-bit address bus 

• various static random access memory (SRAM) control signals. 

The 128-bit-wide data bus minimizes the primary cache miss penalty, and 
allows the use of standard low-cost SRAMs in the design of the secondary 
cache. 

The remainder of the System interface signals are described in Chapter 8. 
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13.1 Data Transfer Rates 

The interface to the secondary cache maximizes service of primary cache 
misses. The Secondary Cache interface, SCData(127:0), supports a data 
rate that is close to the processor-to-primary-cache bandwidth during 
normal operation. To ensure that this bandwidth is maintained, each data, 
tag, and check pin must be connected to a single SRAM device. 

The SCAddr bus, together with the SCOE*, SCDCS*, and SCTCS* 

signals, drives a large number of SRAM devices; because of this, one level 
of external buffering between the processor and the cache array is used. 

13.2 Duplicating Signals 

The buffered control signals control the speed of the Secondary Cache 
interface. Critical control signals are duplicated by design to minimize 
this limitation: the SCWR* signal and SCAddr(O) have four versions so 
that external buffers are not needed to drive them. When an 8-word 
(256-bit) primary cache line is used, these signals can be controlled 
quickly, reducing the time of back-to-back transfers. 

Each duplicated control signal can drive up to 11 SRAMs; therefore, a total 
of 44 SRAM packages can be used in the cache array. This allows a cache 
design using 16-Kbyte-by-64-bit, 64-Kbyte-by-4-bit, or 256-Kbyte-by-4-bit 
standard SRAM.''' 

The benefit of duplicating SCAddr(O) is greater in systems that use fast 
sequential static cache RAM and an 8-word primary cache line. If 
SCAddr(O) is attached to the SRAM address bit that affects column decode 
only, the read cycle time should approximate the output enable time of the 
RAM. For fast static RAM, this cycle time should be half of the nominal 
read cycle time. 



380 



t Other cache designs within this constraint are also acceptable. For example, a smaller 
cache design can use 22 8-Kbyte-by-8-bit static RAMs; this design presents less load on the 
address pins and control signals, and reduces the overall parts count. 
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13.3 Accessing a Split Secondary Cache 

When the secondary cache is split into separate instruction and data 
portions, assertion of the high-order SCAddr bit, SCAddr(17), enables the 
instruction half of the cache. 

It is possible to design a cache that supports both joint and split 
instruction/ data configurations of less than the maximum cache size; in 
doing so, SCAddr(12:0) must address the cache in all configurations. 
SCAddr(17) must support the split instruction/data configuration, and 
any of SCAddr(16:14) bits can be omitted, because of the fixed width of the 
physical tag array. 



13.4 SCDChkBus 

The secondary cache data check bus, SCDChk, is divided into two fields 
to cover the upper and lower 64 bits of SCData. This form is required by 
the 64-bit width of internal data paths. 



13.5 SCTAGBus 



24 



The secondary cache tag bus, SCTag, is divided into three fields, as shown 
in Figure 13-1. The CS field indicates the cache state: invalid, clean 
exclusive, dirty exclusive, shared, or dirty shared. The PIdx field is an 
index to the virtual address of primary cache lines that can contain data 
from the secondary cache. Bits 18:0 contain the upper physical address. 



22 21 



19 18 



CS 



PIdx 



Physical_Tag 



19 



Figure 13-1 SCTag Fields 



The SCDCS* and SCTCS* signals disable reads or writes of either the data 
array or tag array when the opposite array is being accessed. These signals 
are useful for saving power on snoop and invalidate requests since access 
to the data array is not necessary. These signals also write data from the 
primary data cache to the secondary cache. 
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13.6 Operation of the Secondary Cache Interface 

The secondary cache can be configured for various clock rates and static 
RAM speeds. All configurable parameters are specified in multiples of 
PClock, which rims at twice the frequency of the external system clock, 
MasterClock. 

During boot time, secondary cache timing parameters are programmed 
through the boot-time mode bits, as described in Chapter 9. Table 13-1 
lists the secondary cache timing parameters. The following sections 
describe secondary cache read and write cycles. 

Table 13-1 Secondary Cache Timing Parameters 



Symbol 


Number of Cycles 


tRdlCyc 


4-15 PCycles 


tRd2Cyc 


2-15 PCycles 


tois 


2-7 PCycles 


%rlDly 


1-3 PCycles 


tWr2Dly 


1-3 PCycles 


tWrRC 


0-1 PCycles 


tWrSUp 


3-15 PCycles 
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Read Cycles 

There are two basic read qrcles: 4-word read and 8-word read. 

Each secondary cache read cycle begins by driving an address out on the 
address pins. The output enable signal SCOE* is asserted at the same 
time. 

This section describes both 4-word and 8-word read cycles, including 
timing diagrams. 



4-Word Read Cycle 

The 4-word read cycle has two user-accessible timing parameters: 

tRdlCyc read sequence cycle time, which specifies the 
time from the assertion of the SCAddr bus to 
the sampling of the SCData bus 

^Dis cache output disable time, which specifies the 

time from the end of a read cycle to the start of 
the next write cycle 

Figure 13-2 illustrates the 4-word read cycle, including the two user- 
accessible timing parameters. 

PCycle 



SCData(127:0) 
SCTag(24:0) ■ 

SCDChk(15:0) 
SCTChk(6:0) ' 



1 


2 1 3 


4 


5 


6 1 


( Address X 




tPdlCyc 








( 




¥ 






( 




X Data ) 


( 





SCOE 



4 



tDi 



SCAPar(2:0) ) ( [ 

SCDCS*: ~) ( ")( 



SCTCS*: ^ 



Figure 13-2 Timing Diagram of a 4-Word Read Cycle 
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8-Word Read Cycle 

The 8- word read cycle has an additional user-accessible parameter beyond 
that of the 4-word read cycle described above: i^dlCyc time from the 
first sample point to the second sample point. 

In an 8-word read cycle, the low-order address bit, SCAddr(O), changes at 
the same time as the first read sample point. 

Figure 13-3 illustrates the 8-word read cycle, including the three user- 
accessible timing parameters. 



PCycle |i |2|3|4|5|6|7|8 
SCAddr(i 7:1 ) ") ( Address j 

tRdlCyc 



SCAddr(O) ") ( First_Address j Second_Address j 

I tRd2Cyc I 

\< >\ 

SCAPar(2:0) ^ | ^ )( 

SCData(127:0) 

SCTag{24:0) "w u w w w , 

SCDChK,15.0) X )(Ba g h 

SCOE* \ I 



^ J 



SCDCS* 

SCTCS* j|~ 

Figure 13-3 Timing Diagram of an 8-Word Read Cycle 

Notes on a Secondary Cache Read Cycle 

All read cycles can be aborted by changing the address; a new cycle begins 
with the edge on which the address is changed. Additionally, the period 
tois ^fter a read cycle can be interrupted any time by the start of a new 
read cycle. If a read cycle is aborted by a write cycle, SCOE* must be 
deasserted for the tpjg period before the write cycle can begin. 

Read cycles can also be extended indefinitely. There is no requirement to 
change the address at the end of a read cycle. 
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Write Cycles 

There are two basic write cycles: a 4-word write cycle and an 8-word write 
cycle. The secondary cache write cycle begins with the assertion of an 
address onto the address pins. 

This section describes both 4-word and 8-word write cycles, including 
tuning diagrams. 

4-Word Write Cycle 

A 4- word write cycle has three timing parameters: 

tWrlDly delay from the assertion of the address to the 
assertion of SCWR* 

tWrSUp delay from assertion of the second data double- 
word to the deassertion of SCWR* 

tWrRc delay from the deassertion of SCWR* to the 

beginning of the next cycle 

The timing parameter tyy^Rc is 0 for most cache designs. Note that the 
upper data doubleword and the lower data doubleword are normally 
driven one cycle apart; this reduces the peak current consumption in the 
output drivers. 

Figure 13-4 illustrates the 4- word write cycle. Either the upper or lower 
data doubleword can be driven first. 
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PCycle 
SCAddr(17:0) 



SCData(63:0)/ 
SCDChk(7:0) or 
SCData(1 27:64)/ — ( 

SCDChk(15:8) 



SCTChk(6:0)/ 
SCTag(24:0) 

SCData{1 27:64)/ 
SCDChk(15:8) or 
SCData(63:0)/ 
SCDChk(7:0) 



SCAPar(2:0) 



-C 



Address 



Data 



Data 



Data 



tWrSUp 



X 



SCWR* 

SCOE* 
SCDCS* 
SCTCS* 



1 ^ twrlDly 



I 



tWrRc 



Figure 13-4 Timing Diagram of a 4-Word Write Cycle 

8-Word Write Cycle 

An 8-word write cycle has one additional parameter beyond those used by 
the 4-word write cycle: iyj^IDly T^^^ is the time period that begins when 
the low-order address bit SCAddr(O) changes and ends when SCWR* is 
asserted for the second time. The lower half of SCData is driven on the 
same edge as the change in SCAddr(O). 

Figure 13-5 illustrates the 8-word write cycle. 
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PCycle I 
SCAddr(17:1) jj^ 



SCOE* 

SCDCS* 
SCTCS* 



Address 



SCAddr(O) 

SCData(63:0)/ 
SCDChk(7:0) 

SCTag(24:0)/ 
SCTChk(6:0) 

SCDChk(15:8) 
SCData(1 27:64) 

SCAPar{2:0) 

SCWR* 



I 

-i 



First Address 



Second Address 



( First_Data 


X Second_Data } 




, First_Data 


X Second_Data } 



First_Data_MS/DTag_Chk 



Second_Data_MS/DTag_Chk 



First Data 



Second Data 



DC 



tWrSUp 



J V. 



tWr2Dly 



tWrSUp 



tWrRc 



tWrRc 



I 



I 



Figure 13-5 Timing Diagram of an 8-Word Write Cycle 

Notes on a Secondary Cache Write Cycle 

When receiving data from the System interface, the first data doubleword 

can arrive several cycles before the second data doubleword. In this case, 
the cache state machine enters a wait-state that extends SCWR* until 
twrSUp period after the second data item is transmitted. 
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The R4000 processor provides a boundary-scan interface that is 
compatible with Joint Test Action Group (JTAG) specifications, using the 
industry-standard JTAG protocol. 

This chapter describes that interface, including descriptions of boundary 
scanning, the pins and signals used by the interface, and the Test Access 
Port (TAP). 
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14.1 What Boundary Scanning Is 

With the evolution of ever-denser integrated circiiits (ICs), surface- 
mounted devices, double-sided component mounting on printed-circuit 
boards (PCBs), and buried vias, in-circuit tests that depend upon making 
physical contact with internal board and chip connections have become 
more and more difficult to use. The greater complexity of ICs has also 
meant that tests to fully exercise these chips have become much larger and 
more difficult to write. 

One solution to this difficulty has been the development of boundary-scan 
circuits. A boundary-scan circuit is a series of shift register cells placed 
between each pin and the internal circuitry of the IC to which the pin is 
connected, as shown in Figure 14-1. Normally, these boimdary-scan cells 
are bypassed; when the IC enters test mode, however, the scan cells can be 
directed by the test program to pass data along the shift register path and 
perform various diagnostic tests. To accomplish this, the tests use the four 
signals described in the next section: JTDI, JTDO, JTMS, and JTCK. 




Figure 14-1 JTAG Boundary-scan Cells 
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14.2 Signal Summary 

The JTAG interface signals are listed below and shown in Figure 14-2. 



JTDI JTAG serial data in 

JTDO JTAG serial data out 

JTMS JTAG test mode select 

JTCK JTAG serial clock input 




Figure 14-2 JTAG Interface Signals and Registers 

The JTAG boundary-scan mechanism (referred to in this chapter as ]TAG 
mechanism) allows testing of the connections between the processor, the 
printed circuit board to which it is attached, and the other components on 
the circuit board. 

In addition, the JTAG mechanism provides rudimentary capability for 
low-speed logical testing of the secondary cache RAM. The JTAG 
mechanism does not provide any capability for testing the processor itself. 
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14.3 JTAG Controller and Registers 

The processor contains the following JTAG controller and registers: 

• Instruction register 

• Boundary-scan register 

• Bypass register 

• Test Access Port (TAP) controller 

The processor executes the standard JTAG EXTEST operation associated 
with External Test functionality testing. 

Instruction Register 

The JTAG Instruction register includes three shift register-based cells; this 
register is used to select the test to be performed and/ or the test data 
register to be accessed. As listed in Table 14-1, this encoding selects either 

the Boundary-scan register or the Bypass register. 



Table 14-1 JTAG Instruction Register Bit Encoding 



MSB LSB 


Data Register 


0 0 0 


Boundary-scan register (external test only) 


X X 1 


Bypass register 


X 1 X 


Bypass register 


1 X X 


B5^ass register 



The Instruction register has two stages: 

• shift register 

• parallel output latch 

Figure 14-3 shows the format of the Instruction register. 



MSB 



LSB 



Figure 14-3 Instruction Register 
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Bypass Register 

The Bypass register is 1 bit wide. When the TAP controller is in the Shift- 
DR (Bypass) state, the data on the JTDI pin is shifted into the Bypass 
register, and the Bypass register output shifts to the JTDO output pin. 

In essence, the Bypass register is a short-circuit which allows b5^assing of 
board-level devices, in the serial boundary-scan chain, which are not 
required for a specific test. The logical location of the Bypass register in the 
boundary -scan chain is shown in Figure 14-4. Use of the Bypass register 
speeds up access to boundary-scan registers in those ICs that remain 
active in the board-level test datapath. 




Figure 14-4 Bypass Register Operation 
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Boundary-Scan Register 

The Boundary-scan register is a single, 319-bit-wide, shift register-based 
path containing cells connected to all input and output pads on the R4000 
processor. Figure 14-5 shows the three most-significant bits of the 
Boundary-scan register; these three bits control the output enables on the 
various bidirectional buses. 



319 
0E3 



318 
0E2 



317 316 



0E1 



See Table 14-2 



Figure 14-5 Output Enable Bits of the Boundary-scan Register 

The most-significant bit, OE3 (bit 319), is the JTAG output enable bit for 
the SysAD, SysADC, SysCmd, and SysCmdP buses. Output is enabled 
when this bit is set to 1. 

OE2 (bit 318) is the JTAG output enable for the SCData and SCDChk 
buses. Output is enabled when this bit is set to 1. 

OEl (bit 317) is the JTAG output enable for the SCTag and SCTChk buses. 

The remaining 316 bits correspond to 316 signal pads of the processor. 
Output is enabled when this bit is set to 1. 

At the end of this chapter. Table 14-2 lists the scan order of these 316 scan 
bits, starting from JTDI and ending with JTDO. 
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Test Access Port (TAP) 

The Test Access Port (TAP) consists of the four signal pins: JTDI, JTDO, 
JTMS, and JTCK. Serial test data and instructions are communicated 
over these four signal pins, along with control of the test to be executed. 

As Figure 14-6 shows, data is serially scanned into one of the three 
registers {Instruction register. Bypass register, or the Boundary-scan register) 
from the JTDI pin, or it is scanned from one of these three registers onto 
the JTDO pin. 

The JTDI input feeds the least-significant bit (LSB) of the selected register, 
whereas the most-significant bit (MSB) of the selected register appears on 
the JTDO output. 

The JTMS input controls the state transitions of the main TAP controller 
state machine. 

The JTCK input is a dedicated test clock that allows serial JTAG data to be 
shifted synchronously, independent of any chip-specific or system clocks. 




Figure 14-6 JTAG Test Access Port 



Data on the JTDI and JTMS pins is sampled on the rising edge of the 
JTCK input clock signal. Data on the JTDO pin changes on the falling 
edge of the JTCK clock signal. 



MIPS R4000 Microprocessor User's Manual 



395 



Chapter 14 



TAP Controller 

The processor implements the 16-state TAP controller as defined in the 
IEEE JTAG specification. 

Controller Reset 

The TAP controller state machine can be put into Reset state by one of the 
following: 

• deassertion of the VCCOk input resets the TAP controller 

• keeping the JTMS input signal asserted through five 
consecutive rising edges of JTCK input sends the TAP 
controller state machine into its Reset state. 

In either case, keeping JTMS asserted maintains the Reset state. 

Controller States 

The TAP controller has foujr states: Reset, Capture, Shift, and Update. 
They can reflect either instructions (as in the Shift-IR state) or data (as in 
the Capture-DR state). 

• When the TAP controller is in the Reset state, the value 0x7 is 
loaded into the parallel output latch, selecting the Bypass 
register as default. The three most significant bits of the 
Boundary-scan register are cleared to 0, disabling the outputs. 

• When the TAP controller is in the Capture-IR state, the value 

0x4 is loaded into the shift register stage. 

• When the TAP controller is in the Capture-DR (Boundary-scan) 
state, the data currently on the processor input and 1/ O pins is 
latched into the Boundary-scan register. In this state, the 
Boundary-scan register bits corresponding to output pins are 
arbitrary and cannot be checked during the scan out process. 

• When the TAP controller is in the Shift-IR state, data is loaded 
serially into the shift register stage of the Instruction register 
from the JTDI input pin, and the MSB of the Instruction 
register's shift register stage is shifted onto the JTDO pin. 
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• When the TAP controller is in the Shift-DR (Boundary-scan) 
state, data is serially shifted into the Boundary-scan register 
from the JTDI pin, and the contents of the Boundary-scan 
register are serially shifted onto the JTDO pin. 

• When the TAP controller is in the Update-IR state, the current 
data in the shift register stage is loaded into the parallel output 
latch. 

• When the TAP controller is in the Update-DR (Boundary-scan) 
state, data in the Boundary-scan register is latched into the 
register parallel output latch. Bits corresponding to output 
pins, and those I/O pins whose outputs are enabled (by the 
three MSBs of the Boundary-scan register), are loaded onto the 
processor pins. 

Table 14-2 shows the boundary scan order of the processor signals. 



Table 14-2 JTAG Scan Order of R4000 Processor Pins 



Pin # Signal Name 


Pin # Signal Name 


Pin # Signal Name 


Pin # Signal Name 


1. SCDChk(13) 


2. SysADC(l) 


3. SCDChk(l) 


4. SysADG(5) 


5. SCDChk(5) 


6. Status(O) 


7. Status(1) 


8. Status(2) 


9. Status(3) 


10. IvdErr* 


11. Status{4) 


12. IvdAck* 


13. Status(5) 


14. Status(6) 


15. Status(7) 


16. SCDChk(7) 


17. SysADC(7) 


18. SCDChk(3) 


19. SysADC(3) 


20. SCDChk(15) 


21. VCCOk 


22. SCTag(16) 


23. SCDChk{11) 


24. SCData{63) 


25. SysAD(63) 


26. SCData{31) 


27. SysAD(31) 


28. SCData{127) 


29. SCTag(17) 


30. SCData(95) 


31. SCData(62) 


32. SysAD(62) 


33. SCData(30) 


34. SysAD(30) 


35. SCData(126) 


36. SCTag{18) 


37. SCData(94) 


38. RCIock(1 :0) (share 
the same JTAG bit) 


39. SCTag(19) 


40. SCData(61) 


41. SysAD(61) 


42. SCData{29) 


43. SysAD(29) 


44. SCData(125) 


45. Reset* 


46. SCTag{20) 


47. SCData(93) 


48. SCData(60) 


49. SysAD(60) 


50. SCData(28) 


51. SysAD(28) 


52. SCData(124) 


53. ColdReset* 


54. SCTag(21) 


55. SCData(92) 


56. SCData(59) 


57. SysAD(59) 


58. SCData(27) 


59. SysAD(27) 


60. SCData(123) 


61. lOln 


62. SCTag(22) 


63. SGData(91) 


64. SCData(58) 


65. SysAD(58) 


66. SCData{26) 


67. SysAD(26) 


68. SCData(122) 


69. lOOul 


70. SCTag{23) 


71. SGData(90) 


72. SCData(57) 


73. SysAD(57) 


74. SCData(25) 


75. SysAD(25) 


76. SCData(121) 


77. GrpRun* 


78. SCTag(24) 


79. SCData(89) 


80. SCData(56) 


81. SysAD(56) 


82. SCData(24) 


83. SysAD(24) 


84. SCData(120) 


85. GrpStall* 


86. SCTChk(O) 


87. SGData(88) 


88. SCDChk(6) 


89. SysADC(6) 


90. SCDChk(2) 


91. SysADC(2) 


92. SCDChk(14) 


93. NMI* 


94. SCTChk(1) 


95. SCDGhk(IO) 


96. SCData(55) 
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Table 14-2 (cont.) JTAG Scan Order of R4000 Processor Pins 



Pin # Signal Name 


Pin # Signal Name 


Pin # Signal Name 


Pin # Signal Name 


97. SysAD(55) 


98. SCData(23) 


99. SysAD(23) 


100. SGData(119) 


101. Release* 


102. SCTChl<(2) 


1 03. SGData(87) 


1 04. SGData{54) 


105. SysAD(54) 


106. SysAD(22) 


107. Modein 


108. SCData(22) 


109. RdRdy* 


110. SCData(118) 


111. SCData(86) 


112. SCData(53) 


113. SysAD(53) 


114. SCData{21) 


115. SysAD(21) 


116. SGData(117) 


117. ExtRqst* 


118. SCTChk(3) 


119. SCData(85) 


120. SGData(52) 


121. SysAD(52) 


122. SCData(20) 


123. SysAD(20) 


124. SGData(116) 


125. ValidOut* 


126. SCTChk(4) 


127. SCData(84) 


128. SCData(51) 


129. SysAD(51) 


130. SGData{19) 


131. SysAD(19) 


132. SGData(115) 


133. Validin* 


134. SCTChk(5) 


135. SCData(83) 


136. SCAddrOW.X 
(share the same 
JTAG bit) 


137. SCAddrOY,Z 
{share the same 
JTAG bit) 


1 38. SCAddr{1 ) 


139. SCData(50) 


140. SysAD{50) 


141. SCData(18) 


142. SysAD{18) 


143. SGData(114) 


144. Inr(O) 


145. SCTChk(6) 


146. SCData(82) 


147. SCData(49) 


148. SysAD(49) 


149. SCData(17) 


150. SysAD(17) 


151. SCData(113) 


152. SCAddr(2)/lnt*(1) 


153. SCAddr(3) 


154. SCData{81) 


155. SGData(48) 


156. SysAD{48) 


157. SCData(16) 


158. SysAD(16) 


159. SCData(112) 


160. SGAddr(4)/lnt*(2) 


161. SCAddr(5) 


162. SCData(80) 


163. SCAddr(6) 


164. SCAddr(7) 


165. SCAddr(8) 


1 66. SCAddr(9) 


167. SCAddr(IO) 


168. SCAddr(ll) 


169. SC64Addr 


170. SCAddr(12) 


171. SCAddr(13) 


172. SGAddr{14) 


173. SCAddr(15) 


174. SCAddr(16) 


175. SCAddr(17) 


176. SGData(64) 


177. SCAPar{0) 


178. SCAPar(1)/lnt*(3) 


1 79. SGData(96) 


180. SysAD{0) 


181. SCData(O) 


1 82. SysAD(32) 


183. SCData(32) 


1 84. SCData(65) 


185. SCAPar(2) 


186. SCOE*/lnt*(4) 


187. SCData(97) 


188. SysAD(l) 


189. SCData(l) 


190. SysAD(33) 


191. SGData(33) 


192. SGData(66) 


193. SCDCS* 


194. SCTGS*/lnt*(5) 


195. SCData(98) 


196. SysAD(2) 


197. SCData(2) 


198. SysAD{34) 


199. SCData(34) 


200. SCTag{0) 


201 . SCWrW.X* (share 
the same JTAG bit) 


202. SGWrY.Z* (share 
the same JTAG bit) 


203. SCData(67) 


204. SCTag(l) 


205. SysCmd(O) 


206. SCData{99) 


207. SysAD(3) 


208. SCData{3) 


209. SysAD(35) 


210. SCData(35) 


21 1 . SCData(68) 


212. SCTag(2) 


213. SysCmd(l) 


214. SCData(IOO) 


215. SysAD(4) 


216. SCData(4) 


217. SysAD(36) 


218. SGData{36) 


219. SGData(69) 


220. SCTag{3) 


221. SysCmd(2) 


222. SCData{101) 


223. SysAD(5) 


224. SCData{5) 
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Table 14-2 (cont.) JTAG Scan Order of R4000 Processor Pins 



Pin # Signal Name 


Pin # Signal Name 


Pin # Signal Name 


Pin # Signal Name 


225. SysAD(37) 


226. SCData(37) 


227. SCData(70) 


228. WrRdy* 


229. ModeClock 


230. SCData{102) 


231. SysAD(6) 


232. SCData{6) 


233. SysAD(38) 


234. SCData(38) 


235. SCData(71) 


236. SCTag(4) 


237. SysCmd(3) 


238. SCData(103) 


239. SysAD(7) 


240. SCData(7) 


241 . SysAD(39) 


242. SCData{39) 


243. SCDChk(8) 


244. SCTag{5) 


245. SysCmd(4) 


246. SCDChk(12) 


247. SysADC(O) 


248. SCDChk(O) 


249. SysADC(4) 


250. SCDChk(4) 


251 . SCData(72) 


252. SCTag(6) 


253. SysCmd(5) 


254. SCData(104) 


255. SysAD(8) 


256. SCData(8) 


257. SysAD(40) 


258. SCData(40) 


259. SCData(73) 


260. SCTag(7) 


261 . SysCmd(6) 


262. SCData(105) 


263. SysAD(9) 


264. SCData{9) 


265. SysAD(41) 


266. SCData(41) 


267. SCData(74) 


268. SCTag{8) 


269. SysCmd(7) 


270. SCData(1 06) 


271. SysAD(IO) 


272. SCData(1 0) 


273. SysAD(42) 


274. SCData(42) 


275. SCData(75) 


276. SCTag(9) 


277. SysCmd(8) 


278. SCData(107) 


279. SysAD(11) 


280. SCData(11) 


281 . SysAD(43) 


282. SCData(43) 


283. SCData(76) 


284. SCTag(IO) 


285. SysCmdP 


286. SCData{108) 


287. SysAD(12) 


288. SCData{12) 


289. SysAD(44) 


290. SCData(44) 


291 . SCData(77) 


292. SCTag(11) 


293. Fault* 


294. SCData{109) 


295. SysAD(13) 


296. SCData{13) 


297. SysAD(45) 


298. SCData{45) 


299. SCTag(12) 


300. TCIock(1 :0) (share 
the same JTAG bit) 


301 . SCData(78) 


302. SCTag(13) 


303. SCData(IIO) 


304. SysAD(14) 


305. SCData(14) 


306. SysAD{46) 


307. SCData(46) 


308. SCData{79) 


309. SCTag(14) 


310. SCData(lll) 


311. SysAD(15) 


312. SCData{15) 


313. SysAD(47) 


314. SCData{47) 


315. SCDChk(9) 


316. SCTag{15)t 



tSee the section titled Boundary-Scan Register earlier in this chapter, for a 
description of the last three output enable bits, 319:317. 
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14.4 Implementation-Specific Details 

This section describes details of JTAG boundary-scan operation that are 
specific to the processor. 

• The Master Clock, Master Out, Syncin, and SyncOut signal 
pads do not support JTAG. 

• The following pairs of output pads share a single JTAG bit: 

SCAddrOW and SCAddrOX 
SCAddrOY and SCAddrOZ 
SCWrW* and SCWrX* 
SCWrY* and SCWrZ* 
TClock(O) and TClock(l) 
RClock(O) and RClock(l) 

• All input pads data are first latched into a processor clock-based 
register in the pad cell before they are captured into the 
Boundary-scan register in the Capture-DR (Boundary-scan) 
state. When the phase-locked loop is disabled, the processor 
clock is half the frequency of MasterClock. Therefore, when the 
TAP controller is in the Capture-DR (Boundary-scan) state, the 
data setup required at the input pads is more than two 
MasterClock periods before the rising edge of the JTCK. 

• The output enable controls generated from the three most- 
significant bits of the Boundary-scan register are latched into a 
Processor Clock-based register before they actually enable the 
data onto the pads. Therefore, the delay from the rising edge 
of JTCK in the Update-DR (Boimdary-scan) state to data valid 
at the output pins of the chip is greater than two MasterClock 
periods. 
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The R4000 processor supports the following interrupts: six hardware 
interrupts, one internal "timer interrupt," two software interrupts, and 
one nonmaskable interrupt. The processor takes an exception on any 
interrupt. 

This chapter describes the six hardware and single nonmaskable 
interrupts. A description of the software and the timer interrupts can be 
found in Chapter 5. CPU exception processing is also described in 
Chapter 5. 

Floating-point exception processing is described in Chapter 6. 
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15.1 Hardware Interrupts 

The six CPU hardware interrupts can be caused by external write requests 
to the R4000SC, R4000MC, and R4000PC, or can be caused through 
dedicated interrupt pins. These pins are latched into an internal register 
by the rising edge of SClock. The R4000MC and R4000SC packages 
support a single interrupt pin, lnt*(0). The R4000PC package supports six 
interrupt pins, Int*(5:0). 



15.2 Nonmaskable Interrupt (NMI) 

The nonmaskable interrupt is caused either by an external write request to 
the R4000 or by a dedicated pin in the R4000. This pin is latched into an 
internal register by the rising edge of SClock. 



15.3 Asserting Interrupts 



External writes to the CPU are directed to various internal resources, 
based on an internal address map of the processor. When SysAD[6:4] = 0, 
am external write to any address writes to an architecturally transparent 
register called the Interrupt register; this register is available for external 
write cycles, but not for external reads. 

During a data cycle, Sys AD [22:16] are the write enables for the seven 
individual Interrupt register bits and SysAD[6:0] are the values to be 
written into these bits. This allows any subset of the Interrupt register to 
be set or cleared with a single write request. Figure 15-1 shows the 
mechanics of an external write to the Interrupt register. 
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See Figures 15-1 , 
' 15-2, and 15-3. 



Figure 15-1 Interrupt Register Bits and Enables 
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Figure 15-2 shows how the R4000SC and R4000MC interrupts are readable 
through the Cause register. 

• Bit 5 of the Interrupt register in the R4000SC and R4000MC is 
multiplexed with the Timerlnterrupt signal and the result is 
directly readable as bit 15 of the Cause register. 

• Bits 4:1 of the Interrupt register are directly readable as bits 
14:11 of the Cause register. 

• Bit 0 of the Interrupt register is latched into the internal register 
by the rising edge of SClock, then ORed with the lnt*(0) pin, 
and the result is directly readable as bit 10 of the Cause register. 
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See Figure 15-5. 
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Figure 15-2 R4000SC/MC Interrupt Signals 



multiplexer"^ 



The select line for the Timer Interrupt multiplexer is enabled by boot- 
mode bit 19, TimerlntDis, as described in Chapter 9. The Timer Interrupt 
input to the miiltiplexer is asserted when the Count register equals the 
Compare register. 



MIPS R4000 Microprocessor User's Manual 



403 



Chapter 15 



Figure 15-3 shows how the R4000PC interrupts are readable through the 
Cause register. The interrupt bits, Int*(5:0), are latched into the internal 
register by the rising edge of SClock. 

• Bit 5 of the Interrupt register in the R4000PC is ORed with the 
Int*(5) pin and then multiplexed with the Timerlnterrupt 
signal. This result is directly readable as bit 15 of the Cause 
register. 

• Bits 4:0 of the Interrupt register are bit-wise ORed with the 
current value of the interrupt pins Int*[4:0] and the result is 
directly readable as bits 14:10 of the Cause register. 
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Figure 15-3 R4000PC Interrupt Signals 



See 
Figure 15-5. 
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Figure 15-4 shows the internal derivation of the NMI signal, for all 

versions of the R4000 processor. 

The NMI* pin is latched by the rising edge of SClock, however the NMI 
exception occurs in response to the falling edge of the NMI* signal, and is 
not level-sensitive. 

Bit 6 of the Interrupt register is then ORed with the inverted value of NMI* 
to form the nonmaskable interrupt. 
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Figure 15-4 R4000 Nonmaskable Interrupt Signal 
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Figiire 15-5 shows the masking of the R4000 interrupt signal. 

• Cause register bits 15:8 (IP7-IP0) are AND-ORed with Status 
register interrupt mask bits 15:8 (IM7-IM0) to mask individual 
interrupts. 

• Status register bit 0 is a global Interrupt Enable (IE). It is 
ANDed with the output of the AND-OR logic to produce the 
R4000 interrupt signal. 
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Figure 15-5 Masking of the R4000 Interrupt 
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16 



This chapter describes the Error Checking and Correcting (ECC) 
mechanism used in both the R4000 and R4400 processors. 

This chapter also contains a description of the Master/Checker mode used 
in the R4400 processor. 
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16.1 Error Checking in the Processor 



ECC code allows the processor to detect and sometimes correct errors 
made when moving data from one place to another. 

Two major t5^es of data errors can occur in data transmission: 

• hard errors, which are permanent, arise from broken 
interconnects, internal shorts, or open leads 

• soft errors, which are transient, are caused by system noise, 
power surges, and alpha particles. 

Hard errors must be corrected by physical repair of the damaged 
equipment and restoration of data from backup. Soft errors can be 
corrected by using error checking and correcting codes. 



Types of Error Checking 

The processor uses two tfpes of error checking: parity (error detection 
only), and single-bit error correction/double-bit error detection 
(SECDED). 

Parity Error Detection 

Parity is the simplest error detection scheme. By appending a bit to the 
end of an item of data — called a parity hit — single bit errors can be 
detected; however, these errors cannot be corrected. 

There are two iypes of parity: 

• Odd Parity adds 1 to any even number of Is in the data, 
making the total number of Is odd (including the parity bit). 

• Even Parity adds 1 to any odd number of Is in the data, 
making the total number of Is even (including the parity bit). 

Odd and even parity are shown in the example below: 



Data(3:0) 
0 0 10 



Odd Parity Bit 
0 



Even Parity Bit 
1 
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The example above shows a single bit in Data(3:0) with a value of 1; this 
bit is Data(l). 

• In even parity, the parity bit is set to 1. This makes 2 (an even 
number) the total number of bits with a value of 1. 

• Odd parity makes the parity bit a 0 to keep the total number of 
1-value bits an odd number — in the case shown above, the 
single bit Data(l). 

The example below shows odd and even parity bits for various data 
values: 

Data(3:0) Odd Parity Bit Even Parity Bit 

0 110 1 0 

0 0 0 0 1 0 

1111 1 0 

110 1 0 1 

Parity allows single-bit error detection, but it does not indicate which bit 
is in error — for example, suppose an odd-parity value of 00011 arrives. 
The last bit is the parity bit, and since odd parity demands an odd number 
(1,3,5) of Is, this data is in error: it has an even nimiber of Is. However it 
is impossible to tell which bit is in error. To resolve this problem, SECDED 
ECC was developed. 



SECDED ECC Code 

The ECC code chosen for processor secondary cache data and tag is single- 
bit error correction and double-bit error detection (SECDED) code.^ 
SECDED ECC code is an improvement upon the parity scheme; not only 
does it detect single- and certain multi-bit errors, it corrects single-bit 
errors. 



t The 64-bit data code is a modification of one of the 64-bit codes proposed by M. Y. Hsiao, 
to include the ability to detect 3- and 4-bit errors within a nibble. The 25-bit tag code was 
created using the patterns observed in the 64-bit data code. 
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Secondary Cache Data Bus SECDED Code 

The SECDED code protecting secondary cache data bus has the properties 
listed below: 

• It corrects single-bit errors. 

• It detects double-bit errors. 

• It detects 3- or 4-bit errors within a nibble^. 

• It provides 64 data bits protected by 8 check bits, and yields 8- 

bit syndromes (the syndrome is a generated value used to detect 
an error, and locate the position of the single bit in error). 

• It is a minimal-length code; each parity tree used to generate 
the 8-bit syndrome has only 27 inputs, the minimum number 
possible. 

• It provides byte Exclusive-ORs (XORs) of the data bits as part 
of the XOR trees used to build the parity generators. This 
allows selection of byte parity out of the XOR trees that 
generate or check the code. 

• Single-bit errors are indicated either by syndromes that contain 
exactly three Is, or by S5mdromes that contain exactly five Is 
(in which bits 0-3 or bits 4-7 of the syndrome are all Is).-!- 

• Double-bit errors are indicated by syndromes that contain an 
even number of Is. 

• 3-bit errors within a nibble are indicated by syndromes that 
contain five Is, in which bits 0-3 of the syndrome and bits 4-7 
of the syndrome are not all Is. 

• 4-bit errors within a nibble are indicated by syndromes that 
contain four Is. Because this is an even number of Is, 4-bit 
errors within a nibble look like double-bit errors. 



t A nibble is defined here as any group of four bits located within the vertical rules of Figure 
16-1. 

X This makes it possible to decode the syndrome to find which data bit is in error, using 4- 
input NAND gates, provided a pre-decode AND of bits 0-3 and bits 4-7 of the syndrome 
is available. For the check bits, a fuU 8-bit decode of the s)mdrome is required. 
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Secondary Cache Tag Bus SECDED Code 

The SECDED ECC code protecting the secondary cache tag bus has the 
following properties: 

• It corrects single-bit errors. 

• It detects double-bit errors. 

• It detects 3- or 4-bit errors within a nibble. 

• It provides 25 data bits protected by 7 check bits, yielding 7-bit 
syndromes. 

• It provides byte XORs of the data bits as part of the XOR trees 
used to build the parity generators. This allows selection of 
byte parity out of the XOR trees that generate or check the 
code. 

• Single-bit errors are indicated by syndromes that contain 
exactly three Is. This makes it possible to decode the 
syndrome to find which data bit is in error with 3-input NAND 
gates. For the check bits, a full 7-bit decode of the s5mdrome is 
required. 

• Double-bit errors are indicated by S5mdromes that contain an 
even number of Is. 

• 3-bit errors within a nibble are indicated by syndromes that 
contain either five Is or seven Is. 

• 4-bit errors within a nibble are indicated by syndromes that 
contain either four Is or six Is. Because these are even 
numbers of Is, 4-bit errors within a nibble look like double-bit 
errors. 
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Error Checking Operation 

The processor verifies data correctness by using either the parity or the 
SECDED code as it passes data from the System interface to the secondary 
cache, or it moves data from the secondary cache to the primary caches or 
to the System interface. 

System Interface 

The processor generates correct check bits for doubleword, word, or 
partial-word data transmitted to the System interface. As it checks for 
data correctness, the processor passes data check bits from the secondary 
cache, directly without changing the bits, to the System interface if the 
interface is set to ECC mode. If the System interface is set to parity mode, 
the processor indicates a secondary cache ECC error by corrupting the 
state of the SysCmdP signal. 

The processor does not check data received from the System interface for 
external updates and external writes. By setting the SysCmd(4) bit in the 
data identifier, it is possible to prevent the processor from checking read 
response data from the System interface. 

The processor does not check addresses received from the System 
interface, but does generate correct check bits for addresses fransmitted to 
the System interface. 

The processor does not contain a data corrector; instead, the processor 
takes a cache error exception when it detects an error based on data check 
bits. Software, in conjunction with an off-processor data corrector, is 
responsible for correcting the data when SECDED code is employed. 

Secondary Cache Data Bus 

The 16 check bits, SCDChk(15:0), for the 128-bit secondary cache data bus 
are organized as 8 check bits for the upper 64 bits of data, and 8 check bits 
for the lower 64 bits of data. 

System Interface and Secondary Cache Data Bus 

The 8 check bits, SysADC(7:0), for the System interface address and data 
bus provide even-byte parity, or are generated in accordance with a 
SECDED code that also detects any 3- or 4-bit error in a nibble. The 8 check 
bits for each half of the secondary cache data bus are always generated in 
accordance with the SECDED code. 
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Secondary Cache Tag Bus 

The 7 check bits, SCTChk(6:0), for the secondary cache tag bus are 
generated in accordance with the SECDED code, which also detects any 3- 
or 4-bit error in a nibble. 

The processor generates check bits for the tag when it is written into the 
secondary cache and checks the tag whenever the secondary cache is 
accessed. 

The processor contains a corrector for the secondary cache tag; the tag 
corrector is not in-line for processor accesses due to primary cache misses. 
The processor traps when a tag error is detected on a processor access due 
to a primary cache miss. Software, using the processor cache management 
primitives, is responsible for correcting the tag. When executing the cache 
management primitives, the processor uses the corrected tag to generate 
write back addresses and cache state. 

For external accesses, the tag corrector is in-line; that is, the response to 
external accesses is based on the corrected tag. The processor still traps on 
tag errors detected during external accesses to allow software to repair the 
contents of the cache if possible. 

System Interface Command Bus 

In the R4000 processor, the System interface command bus has a single 
parity bit, SysCmdP, that provides even parity over the 9 bits of this bus. 
The SysCmdP parity bit is generated when the System interface is in 
master state, but it is not checked when the System interface is in slave 
state. In the R4400 processor, input parity is reported through the Fault* 
pin. 

When the System interface is set to parity mode, the processor indicates a 
secondary cache ECC error by corrupting the state of the SysCmdP signal. 
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SECDED ECC Matrices for Data and Tag Buses 

The check matrices for data and tags, specifying the distribution of data 
and check bits across nibbles, are shown in Figures 16-1 and 16-4. 

The data bits in Figure 16-1 correspond to SysAD(63:0), SCData(127:64), 
or SCData(63:0). The check bits in Figure 16-1 correspond to 
SysADC(7:0), SCDChk(15:8), or SCDChk(7:0). 

The check bits in Figure 16-4, shown later in this chapter, correspond to 
SCTChk(6:0) and the data bits in Figure 16-4 correspond to SCTag(24:0). 

The parity check matrices shown in these two figures generate the ECC 
code for a fixed-width data word; they can also locate the data bit in error. 
In Figure 16-1, the data word length is 64 bits; in Figure 16-4, the data word 
length is 25 bits. 

ECC Check Bits 

The R4000 processor provides the following check bits: 16 check bits, 
SCDChk(15:0), are used for the secondary cache data bus; 7 check bits, 
SCTChk(6:0), are used for the secondary cache tag bus; 8 check bits, 
SysADC(7:0), are used for the System interface address and data bus; a 
single parity bit, SysCmdP, is used for the System interface command bus. 

In the R4400 processor, the Fault* pin reports data parity or any ECC 
errors received from the System interface during an external update or an 
external write. The Fault* pin also reports errors among the address bits 
received from the System interface. In each case, the full 64-bit data and 8- 
bit ECC are significant. This checking is not affected by the state of the 
disable bit [SysCmd(4)] in the data identifier. No exceptions are generated 
for these checks. 
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Data ECC Generation 

Each of the 64 data bits and 8 check bits has a unique 8-bit SECDED ECC 
check code; this check code is generated by taking the even parity of the 
ECC check code for a selected group of data bits. As Figure 16-1 shows, 
bit locations are numbered from right to left in ascending order, from data 
bit 0 (furthest right) to data bit 63 (furthest left). For example, data bit 0, in 
the far right colimm of Figure 16-1, has an 8-bit check value of 0001 OOII2 
(Os are represented in this figure by periods, (.), because they are not used 
in the calculations). 

Figure 16-1 also gives values for the 8 check bits, 7:0. For instance, the 8- 
bit SECDED ECC code for check bit 6 is in colujtnn 6, near the right hand 
edge of Figure 16-1. 
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Figure 16-1 Check Matrix for Data ECC Code 

NOTE: * This row indicates the number of 1 s in the generated syndrome for each data 
bit in error. 
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As an example of this process, SECDED ECC for Data(63:0) = 0x0000 0000 
0000 0001 is generated in the steps below. 

1. Find any bits in Data(63:0) having a value of 1. 

To determine this, the 16-bit hexadecimal value of 0x0000 0000 
0000 0001 must be expanded to its 64-bit binary equivalent before 
locating the data bit(s) with a value of 1. In this case, the only 1- 
value in 0x0000 0000 0000 0001 is in column 0. 

2. Find the check bits in column 0. 
They are 0001 OOII2, 

3. Take even parity of check bits 0001 OOII2. 

ECC Parity (even) 



MSB (7) 0 0 

(6) 0 0 

(5) 0 0 

(4) 1 1 

(3) 0 0 

(2) 0 0 

(1) 1 1 

LSB (0) 1 1 



4. This even parity value, 0001 OOII2, is sent out over the bus as ECC 
check bits, ECC(7:0). 
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The following example uses data with several 1-value bits: Data(63:0) = 

0x0000 0000 0000 0043. 

1. Expand the data to its binary equivalent in order to generate the 
ECC check bits. 

0x0000 0000 0000 0043 has Is in the last byte only. The last byte 
binary value is: 0x43 = 0100 OOII2. 

column # 7654 3210 
0x0043 = 0100 OOII2 

Since only columns 0, 1, and 6 have Is, they are the only colimins 
that can generate the even parity bits. 

2. Using Figure 16-1, generate even parity for the ECC check codes 
in columns 0, 1, and 6: 

Column 0 ECC Column 1 ECC Column 6 ECC Parity (even) 



0 0 0 0 

0 0 0 0 

0 1 0 1 

10 0 1 
0 0 11 
0 0 11 

11 0 0 

I 1111 



3. This parity value, 0011 IIOO2, is sent out over the ECC(7:0) check 
bus. 
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Detecting Data Transmission Errors 

The following procedure detects data transmission errors. 

1. System A transmits a 64-bit doubleword together with 8 bits of 
SECDED ECC (see Figure 16-2). 



System A 

ECC Generator_j 






Data(63:0) 


System B 






»- 

ECC(7:0) 


^ 1 




1- 




► 











Figure 16-2 Detecting ECC Errors: Transmitting Data and ECC 

2. System B receives the data doubleword, together with the byte of 
ECC check code. 

3. To verify proper transirussion of the 64-bit doubleword and 8-bit 

ECC check code, system B generates its own 8-bit ECC check code 
from the 64-bit doubleword of System A, as shown in Figure 16-3. 

4. System B executes an Exclusive-OR (XOR) on the check bits of 
System A with its own newly-generated ECC check bits, (see 
Figure 16-3). The output of this XOR is called the syndrome. 




Figure 16-3 Detecting ECC Errors: Deriving the Syndrome 



5. If the syndrome is 0000 OOOO2, the data System B received, together 
with the newly-generated ECC check bits from System B, are the 
same as the data and check bits from System A. If the syndrome 
is any other value than 0000 OOOO2, it is assiraied either the 
received word or the received check bits are in error. 
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6. Using the data in Figure 16-1, it may be possible to correct either 
the data bit or check bit in error. Determine if the S5mdrome is in 
Figure 16-1 by coimting the niimber on Is in the syndrome. 

• If the syndrome contains either one, three, or five Is, 
the syndrome is in Figure 16-1. Three or five Is 
indicates that at least one data bit is in error. A single 1 
indicates an ECC check bit is in error. 

• If the syndrome contains two Is, a double-bit error has 
been detected, located in two consecutive bits of a 
nibble. This is not correctable. 

• If the syndrome contains four Is, a 4-bit error has been 
detected, located in four consecutive bits of a nibble. 
This is not correctable. 

If the syndrome is identical to any of the syndromes in the Figure 
16-1, the coltram nirmber of that data or check bit indicates the 

location of the bit in error. The bit that is in error is corrected by 
inverting its state (a 1 is changed to 0; a 0 is changed to 1). 

The following sections show how to use the check matrices in Figure 16-1 
for detecting: 

• single data bit error 

• single data check bit error 

• multiple data bit errors (2 consecutive bits in a nibble) 

• multiple data bit errors (3 consecutive bits in a nibble) 

• multiple data bit errors (4 consecutive bits in a nibble) 
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Single Data Bit ECC Error 

The following procedure detects and corrects a single data bit ECC error. 

1. System A transmits: 
Data(63:0) = 0x0000 0000 0000 0000 

and 

ECC(7:0) check code = 0000 OOOO2 

2. System B receives the following incorrect data: 
Data(63:0) = 0x0000 0000 0000 0001 

and 

ECC(7:0) check code = 0000 OOOO2 

3. System B regenerates ECC for the received data. The correct ECC 

check code for: 

Data(63:0) = 0x0000 0000 0000 0001 
is 

ECC(7:0) = 0001 OOII2 

4. A syndrome is generated by the XOR of the System A check bits, 
0000 OOOO2, and the System B regenerated check bits, 0001 OOII2. 
The resulting s5mdrome is 0001 001 12. Since the syndrome has 
three Is, look for the column with three Is in the parity check 
matrix table. 

5. Searching the matrix (Figure 16-1) shows that the syndrome, 0001 
001 12, corresponds to data bit 0. This means the state of received 
data bit 0 is incorrect. 

6. To correct the error, the system inverts the state of the received 
data bit 0 from a value of 1 to 0. 



420 



MIPS R4000 Microprocessor User's Manual 



Error Checking and Correcting 



Single Check Bit ECC Error 

The following procedure detects and corrects a single check bit ECC error. 

1. System A transmits: 
Data(63:0) = 0x0000 0000 0000 0000 

and 

ECC(7:0) check code = 0000 OOOO2 

2. System B receives the following incorrect check code: 
Data(63:0) = 0x0000 0000 0000 0000 

and 

ECC(7:0) check code = 0000 OOOI2 

3. System B regenerates the ECC for the received data. The correct 

ECC check code for: 

Data(63:0) = 0x0000 0000 0000 0000 
is 

ECC(7:0) = 0000 OOOO2 

4. A syndrome is generated by the XOR of the System A check bits, 
0000 OOOI2, and the System B regenerated check bits, 0000 OOOO2. 
The residting sjmdrome is 0000 OOOI2. 

Since the syndrome has a single 1, it is contained in the check 
matrix. Figure 16-1 shows that the syndrome, 0000 OOOI2, 
corresponds to check bit 0. This indicates that the state of the 
received check bit 0 is incorrect. To correct the error, the system 
inverts the state of the received check bit 0 from a value of 1 to 0. 
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Double Data Bit ECC Errors 

The following procedure detects double data bit ECC errors. 

1. System A transmits: 
Data(63:0) = 0x0000 0000 0000 0000 

and 

ECC(7:0) check code = 0000 OOOO2. 

2. System B receives the following incorrect data: 
Data(63:0) = 0x0000 0000 0000 0011 

and 

ECC(7:0) check code = 0000 OOOO2 

3. System B regenerates the ECC for the received data. The correct 

ECC check code for: 

Data(63:0) = 0x0000 0000 0000 0011 
is 

ECC(7:0) = 0011 OOOO2 

4. A syndrome is generated by the XOR of the System A check bits, 
0000 OOOO2, and the System B regenerated check bits, 0011 OOOO2. 
The resulting s}m.drome is 0011 OOOO2. 

The syndrome of two Is (or an even number of Is) indicates that a 
double-bit error has been detected. Double-bit errors cannot be 
corrected. 
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Three Data Bit ECC Errors 

The following procedure detects three data bit errors that occur within a 
nibble. 

1. System A transmits: 
Data(63:0) = 0x0000 0000 0000 0000 

and 

ECC(7:0) check code = 0000 OOOO2 

2. System B receives the following incorrect data: 
Data(63:0) = 0x0000 0000 0000 0111 

and 

ECC(7:0) check code = 0000 OOOO2 

3. System B regenerates the ECC for the received data. The ECC 
check code for: 

Data(63:0) = 0x0000 0000 0000 0111 
is 

ECC(7:0) = 0111 OOII2 

4. A syndrome is generated by the XOR of the System A check bits, 
0000 OOOO2, and the System B regenerated check bits, 0111 OOII2. 
The resulting s5m.drome is 0111 OOII2. 

The resulting syndrome has five Is. Since no four of the Is are 
contained in check bits (7:4) or check bits (3:0), three errors have 
occurred within a nibble. Triple-bit errors within a nibble cannot 
be corrected. 
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Four Data Bit ECC Errors 

The following procedure detects four data bit errors that occur within a 
nibble. 

1. System A transmits: 
Data(63:0) = 0x0000 0000 0000 0000 

and 

ECC(7:0) check code = 0000 OOOO2 

2. System B receives the following incorrect data: 
Data(63:0) = 0x0000 0000 0000 1111 

and 

ECC(7:0) check code = 0000 OOOO2 

3. System B regenerates the ECC for the received data. The ECC 
check code for: 

Data(63:0) = 0x0000 0000 0000 1111 
is 

ECC(7:0) = 1111 OOOO2 

4. A syndrome is generated by the XOR of the System A check bits, 
0000 OOOO2, and the System B regenerated check bits, 1111 OOOO2. 
The resulting s5m.drome is 1111 OOOO2. 

Since the resulting syndrome has four Is (or an even number of 
Is), this error is recognized as some variation of a double-bit error. 
A 4-bit error within a nibble cannot be corrected. 
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Tag ECC Generation 



The 25-bit tag ECC check matrix is similar to the 64-bit data check matrix; 
the main difference is the number of check bits used and the manner in 
which the errors are decoded. Figure 16-4 shows the check matrix for the 
tag bits. 



Check Bit 



Data Bit 



MSB 


11 




13 


ECC 


10 


Code 


10 


Bits 


13 




11 


LSB 


14 



Number of 
Is in 

syndrome* 



222 
432 



. 1. . 
1. . . 
. . 1. 
. 1. . 
1. . . 
. . 1. 
1111 



3331 



12 



22 
10 



1. . . 
. 1. . 

I. . . 
. 1. . 
. . . 1 
. . 1. 

II. . 



3311 



34 



11 
98 



1. . . 
. 1. . 



. . . 1 
. . 1. 

I. . . 
. 1. . 

II. . 



3311 



56 



11 

76 



. . . 1 
. . 1. 
1. . . 
. 1. . 

I. . . 
. 1. . 

II. . 



3311 



1111 
5432 



1111 
1111 

1. . . 



3333 



11 

1098 



1. . . 
1111 
1111 
. 1. . 

. . 1. 



3333 



7654 



1. . . 

. 1. . 
1111 
1111 
. . 1. 
. . . 1 



3333 



Figure 16-4 Check Matrix for the Tag ECC Code 



3210 



1. . . 
. 1. . 
. . 1. 

1111 
1111 
. . . 1 



3333 



NOTE: * This row indicates the number of 1 s in the generated syndrome for each data 
bit in error. 
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Summary of ECC Operations 

ECC operations are siraunaarized in Tables 16-1 through 16-4. 



Table 16-1 Error Checking and Correcting Summary for Internal Transactions 



Bus 


Secondary 
Cache to 
Primary 
Cache 


Primary 
Cache to 
Secondary 
Cache 


Uncached 
Load 


Uncached 
Store 


Processor or 
Secondary Cache 
Data 


Checked; 
Trap on Error 


Primary 
Cache parity 
checked; Trap 
on Error 


From 

System 
Interface 


Not 

Checked 


Secondary Cache 
Data Check Bits 


Checked; 
Trap on Error 


Generated 


NA 


NA 


Secondary Cache Tag 
and Check Bits 


Checked; not 
corrected in 
Secondary 
cache; Trap on 
error 


NA 


NA 


NA 


System Interface 
Address/Command 
and Check Bits: 
Transmit 


NA 


NA 


Generated 


Generated 


System Interface 
Address/ Command 
and Check Bits: 
Receive 


NA 


NA 


Not 

Checked; 
reported to 
the Fault* 
pin 


NA 


System Interface Data 


NA 


NA 


Checked 
Trap on 
error^ 


From 
Processor 


System Interface Data 
Check Bits 


NA 


NA 


Checked; 
Trap on 
Error^ 


Generated 



t If error level (EKL bit of the Status register) is 1, the error is reported to the Fault* pin. 
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Table 16-2 Error Checking and Correcting Summary for Internal Transactions 



Bus 


^frkf^ Irk 

Shared 
Cache Line 


Cache 
Instruction 


Secondary 
Cache Load 
from System 
Interface 


Cache Write 
to System Interface 


Processor or 
Secondary Cache 

Data 


NA 


Check on 
cache 
writeback; 
Trap on Error 


From 
System 
Interface 
imchanged 


Checked; Trap on 
Error 


Secondary Cache 
Data Check Bits 


NA 


Check on 
cache 
writeback; 
Trap on Error 


From 
System 
Interface 
imchanged 


Checked; Trap on 
Error 


Secondary Cache Tag 
and Check Bits 


Checked on 
read part of 
RMW^; correct 
Secondary 
cache tag; Trap 
on Error 


Checked; 
corrected 

CJc^_Lii IU.C11 y 

cache tag*; 
Trap on Error 


I, on o t" a to H 

VJTCX LcX CL Led 


Checked; not 

^-UXXcLLcCl/ XXCLIJUXL 

Error 


System Interface 
Address, Command, 
and Check Bits: 
Transmit 


Generated 


Generated 


Generated 


Generated 


System Interface 
Address, Command, 
and Check Bits: 
Receive 


NA 


NA 


Not 

Checked 


NA 


System Interface Data 


From 
Processor 


From Primary 
or Secondary 
Cache 


Checked; 
Trap on 
Error* 


From Secondary 

Cache 


System Interface Data 
Check Bits 


Generated 


From Primary 
or Secondary 
Cache 


Checked; 
Trap on 
Error* 


From Secondary 
Cache (SysCmdP 
signal corrupted 
if System 
interface set to 
parity mode) 



t Read-Modify- Write cycle 

% If error level {ERL bit of the Status register) is 1, the error is reported to the Fault* pin. 
* Only if the current CACHE op needs to modify and write back the tag. 
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Table 16-3 Error Checking and Correcting Summary for External Transactions 



Bus 


Read 
Request 


Write 
Request 


Invalidate 
Request 


Update 
Request 


Processor or 
Secondary Cache 
Data 


NA 


NA 


Not Checked 


Checked on read 
part of RMW*"; 
Trap on Error* 


Secondary Cache 
Data Check Bits 


NA 


NA 


Not Checked 


Checked on read 
part of RMW*"; 
Trap on Error*; 
Generation on 
write part of 
RMW if written 


Secondary Cache Tag 
and Check Bits 


NA 


NA 


Checked on 
read part of 
RMW+; Trap 
on Error*; 
Generation 
on write part 
of RMW if 
written 


Checked on read 
part of RMW*"; 
Trap on Error; 
Generation on 
write part of 
RMW if written 


System Interface 
Address, Command 
and Check Bits: 
Transmit 


Generated 


NA 


NA 


NA 


System Interface 
Address, Command 
and Check Bits: 
Receive 


Not 

Checked; 
reported to 
the Fault* 
pin 


Not 

Checked; 
reported to 
the Fault* 
pin 


Not Checked; 
reported to 
the Fault* pin 


Not Checked; 
reported to the 
Fault* pin 


System Interface Data 


From 
Processor 


Checked; 
Trap on 
Error 


Not Checked 


Not Checked; 
reported to the 
Fault* pin 


System Interface Data 
Check Bits 


Generated 


Checked; 
Trap on 
Error 


Not Checked 


Not Checked; 
reported to the 
Fault* pin 



t Read-Modify- Write cycle 

X Only the pair of doublewords accessed on the read portion of RMW is checked. 



428 



MIPS R4000 Microprocessor User's Manual 



Error Checking and Correcting 



Table 16-4 Error Checking and Correcting Summary for External Transactions 



Bus 


Intervention Request 
Data Returned 


Intervention Request 
State Returned 


Snoop Request 


Processor or 
Secondary Cache 
Data 


Checked; Trap on 
Error 


Not Checked 


Not Checked 


Secondary Cache 
Data Check Bits 


Checked; Trap on 
Error 


Not Checked 


Not Checked 


Secondary Cache Tag 
and Check Bits 


Checked and 
corrected on read 
part of RMW*"; Trap 

on Error; 

Generation on write 

partofRMWif 

written. 


Checked and 
corrected on read part 
of RMW^; Trap on 
Error; Generation on 

WilLt; Ucli L yji. XVLVXVV 11 

written. 


Checked and 
corrected on 
read part of 
RMW+; 
Trap on Error; 
Generation on 
write part of 
RMWif 
written. 


System Interface 
/\uuresS/ i^ominana, 
and Check Bits: 
Transmit 


Generated 


Generated 


Generated 


System Interface 
Address, Command, 
and Check Bits: 
Receive 


Not Checked; 
reported to the 
Fault* pin 


Not Checked; 
reported to the Fault* 
pin 


Not Checked; 
reported to the 
Fault* pin 


System Interface Data 


From Secondary 

Cache 


NA 


NA 


System Interface Data 
Check Bits 


From Secondary 
Cache 


NA 


NA 



t Read-Modify- Write cycle 
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16.2 R4400 Master/Checker Mode 

The R4400 processor supports four Master/ Checker mode configiirations, 
which are designated by boot-mode bit settings: Complete Master, 
Complete Listener, System Interface Master, and Secondary Cache Master. 
The boot-mode bits, SIMasterMd (mode bit 18) and SCMasterMd (mode 
bit 42), define Master/Checker configurations. Table 16-5 lists the 
configixrations encoded by these bits. 



Table 16-5 Boot-Mode Bit Encodings of Master/Checker Modes 



SCMasterMd 
(Bit 42) 


SIMasterMd 
(Bit 18) 


Mode 


0 


0 


Complete Master 

(required for single-chip operation) 


1 


1 


Complete Listener 

(paired with Complete Master) 


1 


0 


System Interface Master 
(SIMaster) 


0 


1 


Secondary Cache Master 
(SCMaster, paired with SIMaster) 



For a non-fault tolerant system, these bits must be set to OO2. This is the 
Complete Master mode. 

In a fault tolerant system, there are two possible configurations using the 
Master-Listener and Cross-Coupled modes described in Table 16-5. These 
are referred to as lock-step configurations, and are described later in this 
section. 
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Connecting a System in Lock Step 

By operating in lock step, a system with more than one R4400 processor 
can be configured to improve data integrity. In such a configuration, 
output signals and 1/ O buses used during output are connected in parallel 
between the processors. One processor is defined at boot time as a bus 
driver, and the remaining processor(s) is defined as a bus monitor. 
Starting with the assertion of Reset*, all microprocessors must be 
synchronous, and execute identical operations on a cycle-by-cycle basis. 
The processor(s) designated as bus monitor compares the outputs and 
buses at bus-cycle boimdaries, and asserts the Fault*^ signal on any 
mismatch. 

In a lock step operation, the following R4400 signal groups are connected 
in parallel: 

• System interface 

• Secondary Cache interface (R4400SC and R4400MC only) 

• Interrupt interface 

The following signals are not connected in parallel: 

• Initialization interface, ModeClock, Modein, and Reset* 

signals 

• JTAG interface signals, JTDO and JTMS 

• all Clock/ Control interface signals except VssP and VccP 

The remaining processor signals can be connected either in parallel or 
independently. 



t Fault* is a non-persistent signal which is synchronous with the System interface. Fault* 
signal timing is determined by the PClock-to-SClock divisor from boot-time mode bit 
settings. 
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Master-Listener Configuration 

As shown in Figure 16-5, the Master-Listener lock step configuration pairs 
a Complete Master (mode bits 42 and 18 = OO2) with a Complete Listener 
(mode bits 42 and 18 = II2). In this configuration, the Complete Listener 
has disabled output drivers; otherwise, the two R4400 processors operate 
identically, both receiving the same inputs. On all output cycles, the 
Complete Listener compares data on the output and I/O buses with 
expected data, and asserts the Fault* signal in the event of a 
miscomparison. 



R4400 



System Interface bus 

SysAD/ 



External 
Agent 

^ 



Fault* 



Complete 
Master_^ 



Secondary cache bus 
SCAddr 




Figure 16-5 Master-Listener Configuration of Master/Checker Mode 
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Cross-Coupled Checking Configuration 

In the Cross-Coupled Checking configuration, one of the R4400 processors 
drives the data bus pins and is labelled the System Interface Master (mode 
bits 42 and 18 = IO2). The other R4400 processor drives the ECC or parity 
check pins on the same bus and is labelled the Secondary Cache Master 
(mode bits 42 and 18 = OI2). This is shown in Figure 16-6. 

Both processors monitor the buses and indicate a miscomparison by 
asserting their respective Fault* signals. The Fault* signal indicates error 
conditions not specifically covered by R4400 processor exceptions.^ 



R4400 



System Interface bus 

SysAD/ 



Secondary cache bus 

SCData/ 



External 
Agent 

1 



T 



Secondary cache 



SCData/ 
-*-SCTag 



Data Chk/ 
'Tag Chk 




Fault 



Figure 16-6 Cross-Coupled Configuration of Master/Checker Mode 



t This includes such errors as an input parity error at SysCmd. 
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The signals that are connected in parallel and driven from the System 
Interface Master (1 in Figure 16-6) include: 

• SysAD(63:0) 

• SysCmd(8:0) 

• SCAPar(2:0) 

Signals that are connected in parallel and driven from the Secondary 
Cache Master (2 in Figure 16-6) include: 

• SysADC(7:0) 

• SysCmdP 

• ValidOut* 

• Release* 

• SCAddr(17:l) 

• SCAddrO(W:Z) 

• SCOE* 

• SCWr(W:Z)* 

• SCData(127:0) 

• SCDChk(15:0) 

• SCTag(24:0) 

• SCTChk(6:0) 

• SCDCS* 

• SCTCS* 

It should be noted that the fault detection mechanism associated with the 
Fault* pin does not cause any exceptions; the processor continues to run 
normally regardless of the state of the Fault* signal. It is up to external 
logic to handle an asserted Fault* signal. 
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Fault Detection 

Fault detection of an output miscomparison occurs at the end of the bus 
cycle (the length of the cycle is programmed at boot-mode time; see 
Chapter 9). When the R4400 processor is in master state, outputs at the 
System interface are checked at the end of every System interface cycle. At 
the Secondary Cache interface, outputs are checked at the end of each read 
or write cycle. 

SCAPar(2:0) transition and check times are delayed from the rest of the 
Secondary Cache interface by one PClock. SCAPar(2:0) transitions occur 
one PClock after SCAddr transitions, or when the R4400 is changing from 
a read cycle to a write cycle without an address change. SCAPar(2:0) 
signals do not follow the timing of SCWr* signals, which are set separately 
through the programming of the boot-time mode bits. 

The R4400 processor has an internal fault detection latency of 4 PClocks 
(clock cycles are described in Chapter 10), whereupon Fault* is 
synchronized with the System interface. An output fault detected and 
propagated through the R4400 processor internal fault logic in a prior 
System interface cycle is reported in the current cycle. 

In Complete Master mode, output fault reporting is disabled for the 
Secondary Cache interface, but enabled for the following System interface 
signals: SysCmd, SysCmdP, SysAD, SysADC, ValidOut*, and Release*. 
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Reset Operation 

When the R4400 processor is a Complete Listener, SIMaster, or SCMaster, 
an assertion of Reset* after the initial boot sequence is significant. 

If Reset* is asserted a second time and subsequently deasserted, the R4400 
processor changes to Forced Complete Master mode and drives all 
outputs. 

If Reset* is asserted and deasserted a third time, the R4400 processor 
returns to its prior mode, as programmed by the boot-mode bits. 

Qn any subsequent assertion and deassertion of Reset*, the processor 
alternates between the two modes described above: the mode determined 
by boot-time mode bits if the Master/ Checker mode is Complete Listener, 
SIMaster, or SCMaster, or Forced Complete Master mode. 

In Forced Complete Master mode, the Fault* pin reports all output faults, 
not just faiilts of the System interface as are reported in Complete Master 
mode. 

Fault History 

Two internal fault history bits. Output Fault History and Input Fault 
History, record output faults and certain input faults reported through the 
Fault* pin. These bits are cleared with each deassertion of Reset*. 

The two fault history bits are readable when Reset* is asserted, and the 
Fault* pin changes from reporting live faults to indicating which fault 
history bit was set when Reset* was deasserted in the previous cycle. The 
Modein pin acts as selector; if Modein = 0, Fault* indicates the inverted 
state of the Output fault history bit. If Modein = 1, Fault* indicates the 
inverted state of the Input fault history bit. 

The fault history bits can be reset (cleared) while the R4400 processor is 
rimning by asserting 1 to the Modein pin. Consequently, Modein must 
be held to 0 to maintain the status of the faiilt history bits. Table 16-6 
presents this information in tabular form. 
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Table 16-6 R4400 Fault History Bit Encoding 



Doot/Keset 
Controls 


Modein Pin 


Fault History 
Bits 


Fault* Pin 


Master/Checker 
Mode 


VccOk just asserted 
(goes from 0 to 1) 


Used as 
boot-mode 
bits; scan 
data 


N/A 


N/A 


N/A 


Reset* just 
deasserted (goes 
trom U to 1) 


N/A 


Cleared to 0 


N/A 


N/A 


Reset* deasserted 
in normal operation 


0 


Set and latched, 
if fault occurs 


Live faults are 
reported 


N/A 


Reset* deasserted 
in normal operation 


I 




Live faults are 
reported 


N/A 


Reset* just asserted 
(goes from 1 to 0) 


N/A 


N/A 


N/A 


Changed, 
toggling 
between mode 
bits and Forced 
Complete 
Master 


Reset* just asserted 
(R4400 is reset) 


0 


Output Fault 
History bit is 
connected to the 
Fault* pin 


N/A 


N/A 


Reset just asserted 
(R4400 is reset) 


1 


Input Fault 
History bit is 
connected to 
Fault* pin 


N/A 


N/A 



MIPS R4000 Microprocessor User's Manual 



437 



Chapter 16 



438 MIPS R4000 Microprocessor User 's Manual 



CPU Instruction Set Details 



A 



This appendix provides a detailed description of the operation of each 
R4000 instruction in both 32- and 64-bit modes. The instructions are listed 
in alphabetical order. 

Exceptions that may occur due to the execution of each instruction are 
listed after the description of each instruction. Descriptions of the 
immediate cause and manner of handling exceptions are omitted from the 
instruction descriptions in this appendix. 

Figures at the end of this appendix list the bit encoding for the constant 
fields of each instruction, and the bit encoding for each individual 
instiuction is included with that instiuction. 
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A.l Instruction Classes 

CPU instructions are divided into the following classes: 

• Load and Store instructions move data between memory and 
general registers. They are all I-type instructions, since the 
only addressing mode supported is base register + 16-bit 
immediate offset 

• Computational instructions perform arithmetic, logical and 
shift operations on values in registers. They occur in both 
R-type (both operands are registers) and I-t5^e (one operand is 
a 16-bit immediate) formats. 

• Jump and Branch instructions change the control flow of a 
program. Jumps are always made to absolute 26-bit word 
addresses (J-type format), or register addresses (R-type), for 
returns and dispatches. Branches have 16-bit offsets relative to 
the program counter (I-type). Jump and Link instructions save 
their return address in register 31 . 

• Coprocessor instructions perform operations in the 
coprocessors. Coprocessor loads and stores are I-type. 
Coprocessor computational instructions have coprocessor- 
dependent formats (see the FPU instructions in Appendix B). 
Coprocessor zero (CPO) instructions manipulate the memory 
management and exception handling facilities of the processor. 

• Special instructions perform a variety of tasks, including 
movement of data between special and general registers, trap, 
and breakpoint. They are always R-type. 
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A.2 Instruction Formats 



Every CPU instruction consists of a single word (32 bits) aligned on a word 
boundary and the major instruction formats are shown in Figure A-1. 

I-Type (Immediate) 

31 26 25 21 20 16 15 0 



op 



rs 



rt 



immediate 



J-Type (Jump) 



31 



26 25 



op 



target 



R-Type (Register) 

31 26 25 21 20 16 15 



1110 



6 5 



op 



rs 



rt 



rd 



shamt 



funct 



op 6-bit operation code 

rs 5-bit source register specifier 

rt 5-bit target (source/destination) or branch condition 

.. ^ 16-bit immediate, brancti displacement or address 

immediate .. , , ^ 
displacement 

target 26-bit jump target address 

rd 5-bit destination register specifier 

shamt 5-bit shift amount 

funct 6-bit function field 



Figure A-1 CPU Instruction Formats 
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A.3 Instruction Notation Conventions 

In this appendix, all variable subfields in an instruction format (such as rs, 
rt, immediate, etc.) are shown in lowercase names. 

For the sake of clarity, we sometimes use an alias for a variable subfield in 
the formats of specific instructions. For example, we use rs = base in the 
format for load and store instructions. Such an alias is always lower case, 
since it refers to a variable subfield. 

Figures with the actual bit encoding for all the mnemonics are located at 
the end of this Appendix, and the bit encoding also accompanies each 
instruction. 

In the instruction descriptions that follow, the Operation section describes 
the operation performed by each instruction using a high-level language 
notation. The R4000 can operate as either a 32- or 64-bit microprocessor 
and the operation for both modes is included with the instruction 
description. 

Special symbols used in the notation are described in Table A-1. 
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Table A-1 CPU Instruction Operation Notations 


bymDoi 


Meaning 


<- 


Assignment. 


II 


Bit string concatenation. 




Replication of bit value x into a y-bit string. Note: x is always a single-bit value. 




Selection of bits y through zof bit string x. Little-endian bit notation is always 
used. If y is less than z, this expression is an empty (zero length) bit string. 


+ 


2's complement or floating-point addition. 




2's complement or floating-point subtraction. 


* 


2's complement or floating-point multiplication. 


div 


2's complement integer division. 


mod 


2's complement modulo. 


/ 


Floating-point division. 


< 


2's complement less than comparison. 


and 


Bit-wise logical AND. 


or 


Bit-wise logical OR. 


xor 


Bit-wise logical XOR. 


nor 


Bit-wise logical NOR. 


GPRW 


General-Register x. The content of GPR[0] is always zero. Attempts to alter 

the content of GPR[0] have no effect. 


CPR[z,x] 


Coprocessor unit z, general register x. 


CCR[z,x] 


Coprocessor unit z, control register x. 


COC[z] 


Coprocessor unit z condition signal. 


BigEndianMem 


Big-endian mode as configured at reset (0 — > Little, 1 — > Big). Specifies the en- 
dianness of the memory interface (see LoadMemory and StoreMemory), and 
the endianness of Kernel and Supervisor mode execution. 


ReverseEndian 


Signal to reverse the endianness of load and store instructions. This feature is 
available in User mode only, and is effected by setting the RE bit of the Status 
register. Thus, ReverseEndian may be computed as (SR25 and User mode). 


BigEndianCPU 


The endianness for load and store instructions (0 Little, 1 Big). In User 
mode, this endianness may be reversed by setting SR25. Thus, BigEndianCPU 
may be computed as BigEndianMem XOR ReverseEndian. 


LLbit 


Bit of state to specify synchronization instructions. Set by LL, cleared by ERET 
and Invalidate and read by SC. 


T+/: 


Indicates the time steps between operations. Each of the statements within a 
time step are defined to be executed in sequential order (as modified by con- 
ditional and loop constructs). Operations which are marked 7+/.- are executed 
at instruction cycle / relative to the start of execution of the instruction. Thus, 
an instruction which starts at time /executes operations marked T+/; at time 
/ + j. The interpretation of the order of execution between two instructions or 
two operations which execute at the same time should be pessimistic; the or- 
der is not defined. 
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Instruction Notation Examples 

The following examples illustrate the application of some of the 
instruction notation conventions: 



Example #1 : 

GPR[rt] <- immediate || 0^^ 

Sixteen zero bits are concatenated with an immediate 
value (typically 16 bits), and the 32-bit string (with the lower 
1 6 bits set to zero) is assigned to General-Purpose Register rt. 



Example #2: 

(immediatei5)^® || immediatei5, q 

Bit 15 (the sign bit) of an immediate value is extended for 
1 6 bit positions, and the result is concatenated with bits 1 5 
through 0 of the immediate value to form a 32-bit sign 
extended value. 



A-6 



MIPS R4000 Microprocessor User's Manual 



CPU Instruction Set Details 



A.4 Load and Store Instructions 

In the R4000 implementation, the instruction immediately following a 
load may use the loaded contents of the register. In such cases, the 
hardware interlocks, requiring additional real cycles, so scheduling load 
delay slots is still desirable, although not required for functional code. 

Two special instructions are provided in the R4000 implementation of the 
MIPS ISA, Load Linked and Store Conditional. These instructions are 

used in carefully coded sequences to provide one of several 
S5mchronization primitives, including test-and-set, bit-level locks, 
semaphores, and sequencers/ event counts. 

In the load and store descriptions, the functions listed in Table A-2 are 
used to summarize the handling of virtual addresses and physical 
memory. 



Table A-2 Load and Store Common functions 



Function 


Meaning 


AddressTranslation 


Uses the TLB to find the physical address given the virtual 
address. The fimction fails and an exception is taken if the 
required translation is not present in the TLB. 


LoadMemory 


Uses the cache and main memory to find the contents of 
the word containing the specified physical address. The 
low-order two bits of the address and the Access Type field 
indicates which of each of the four bytes within the data 
word need to be returned. If the cache is enabled for this 
access, the entire word is returned and loaded into the 
cache. 


StoreMemory 


Uses the cache, write buffer, and main memory to store the 
word or part of word specified as data in the word 
containing the specified physical address. The low-order 
two bits of the address and the Access Type field indicates 
which of each of the four bytes within the data word 
should be stored. 
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As shown in Table A-3, the Access Type field indicates the size of the data 
item to be loaded or stored. Regardless of access type or byte-numbering 
order (endianness), the address specifies the byte which has the smallest 
byte address in the addressed field. For a big-endian machine, this is the 
leftmost byte and contains the sign for a 2's complement niraiber; for a 
little-endian machine, this is the rightmost byte. 



Table A-3 Access Type Specifications for Loads/Stores 



Access Type Mnemonic 


Value 


Meaning 


DOUBLEWORD 


7 


8 bytes (64 bits) 


SEPTIBYTE 


6 


7 bytes (56 bits) 


SEXTIBYTE 


5 


6 bytes (48 bits) 


QUINTIBYTE 


4 


5 bytes (40 bits) 


WORD 


3 


4 bytes (32 bits) 


TRIPLEBYTE 


2 


3 bytes (24 bits) 


HALFWORD 


1 


2 bytes (16 bits) 


BYTE 


0 


1 byte (8 bits) 



The bytes within the addressed doubleword which are used can be 
determined directly from the access type and the three low-order bits of 
the address. 
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A.5 Jump and Branch Instructions 

All jump and branch instructions have an architectural delay of exactly 
one instruction. That is, the instruction immediately following a jump or 
branch (that is, occupying the delay slot) is always executed while the 
target instruction is being fetched from storage. A delay slot may not itself 
be occupied by a jump or branch instruction; however, this error is not 
detected and the results of such an operation are undefined. 

If an exception or interrupt prevents the completion of a legal instruction 
during a delay slot, the hardware sets the EPC register to point at the jump 
or branch instruction that precedes it. When the code is restarted, both the 
jump or branch instructions and the instruction in the delay slot are 
reexecuted. 

Because jump and branch instructions may be restarted after exceptions or 
interrupts, they must be restartable. Therefore, when a jiraip or branch 
instruction stores a return link value, register 31 (the register in which the 
link is stored) may not be used as a source register. 

Since instructions must be word-aligned, a Jump Register or Jump and 
Link Register instruction must use a register whose two low-order bits are 
zero. If these low-order bits are not zero, an address exception will occur 
when the jump target instruction is subsequently fetched. 
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A.6 Coprocessor Instructions 

Coprocessors are alternate execution units, which have register files 
separate from the CPU. The MIPS architecture provides four coprocessor 
units, or classes, and these coprocessors have two register spaces, each 
space containing thirty-two 32-bit registers. 

• The first space, coprocessor general registers, may be directly 
loaded from memory and stored into memory, and their 
contents may be transferred between the coprocessor and 
processor. 

• The second space, coprocessor control registers, may only have 
their contents transferred directly between the coprocessor and 
the processor. Coprocessor instructions may alter registers in 
either space. 

A.7 System Control Coprocessor (CPO) Instructions 

There are some special limitations imposed on operations involving CPO 
that is incorporated within the CPU. Although load and store instructions 
to transfer data to /from coprocessors and to move control to /from 
coprocessor instructions are generally permitted by the MIPS architecture, 
CPO is given a somewhat protected status since it has responsibility for 
exception handling and memory management. Therefore, the move to/ 
from coprocessor instructions are the only valid mechanism for writing to 
and reading from the CPO registers. 

Several CPO instructions are defined to directly read, write, and probe TLB 
entries and to modify the operating modes in preparation for returning to 
User mode or interrupt-enabled states. 
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ADD Add ADD 



31 26 


25 


21 


20 16 


15 


11 


10 6 


5 0 




SPECIAL 


rs 


rt 


rd 


0 


ADD 






000000 


00000 


100000 




6 




5 


5 


5 




5 


6 



Formal: 

ADD rd, rs, rt 

Description: 

The contents of general register rs and the contents of general register rt 
are added to form the result. The result is placed into general register rd. 
In 64-bit mode, the operands must be valid sign-extended, 32-bit values. 

An overflow exception occurs if the carries out of bits 30 and 31 differ (2's 
complement overflow). The destination register rd is not modified when 
an integer overflow exception occurs. 

Operation: 

32 T: GPR[rd] ^GPR[rs] + GPR[rt] 

64 T: temp <r- GPR[rs] + GPR[rt] 

GPR[rd] ^ (temp3i)32 \\ temp3i ..o 

Exceptions: 

Integer overflow exception 
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ADDI Add Immediate ADDI 



31 26 


25 21 


20 16 


15 




0 




ADDI 
0 0 10 0 0 


rs 


rt 


immediate 




6 


5 


5 




16 





Format: 

ADDI rt, rs, immediate 

Description: 

The 16-bit immediate is sign-extended and added to the contents of general 

register rs to form the result. The resuh is placed into general register rt. 
In 64-bit mode, the operand must be valid sign-extended, 32-bit values. 

An overflow exception occurs if carries out of bits 30 and 31 differ (2's 
complement overflow). The destination register rt is not modified when 
an integer overflow exception occurs. 

Operation: 

32 T: GPR [rt] GPR[rs] +(immediatei5)^^ I I immediate-|5...o 

64 T: temp <- GPR[rs] + (immediate-is)^® I I immediatei5,, o 
GPR[rt] <- (temp3i)32 || tempai . o 

Exceptions: 

Integer overflow exception 
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ADDIU Add Immediate Unsigned ADDIU 



31 26 


25 21 


20 16 


15 




0 




ADDIU 
0 0 10 0 1 


rs 


rt 


immediate 




6 


5 


5 




16 





Format: 

ADDIU rt, rs, immediate 

Description: 

The 16-bit immediate is sign-extended and added to the contents of general 
register rs to form the result. The result is placed into general register rt. 
No integer overflow exception occurs imder any circumstances. In 64-bit 
mode, the operand must be valid sign-extended, 32-bit values. 

The only difference between this instruction and the ADDl instruction is 
that ADDIU never causes an overflow exception. 

Operation: 

32 T: GPR [rt] <- GPR[rs] + (immediateis)^^ || immediateis. .q 

64 T: temp <- GPR[rs] + (immediate-is)'^^ I I immediatei5 _,o 
GPR[rt]^(temp3i)32||temp3i..o 

Exceptions: 

None 
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ADDU Add Unsigned ADDU 



31 26 


25 


21 


20 16 


15 


11 


10 6 


5 0 




SPECIAL 
000000 


rs 


rt 


rd 


0 

00000 


ADDU 
10000 1 




6 




5 


5 


5 




5 


6 



Formal: 

ADDU rd,rs,rt 

Description: 

The contents of general register rs and the contents of general register rt 
are added to form the result. The result is placed into general register rd. 
No overflow exception occurs under any circumstances. In 64-bit mode, 
the operands must be valid sign-extended, 32-bit values. 

The only difference between this instruction and the ADD instruction is 
that ADDU never causes an overflow exception. 

Operation: 

32 T: GPR[rd] ^GPR[rs] + GPR[rt] 

64 T: temp <- GPR[rs] + GPR[rt] 

GPR[rd] ^ (temp3i)32 || tempai . o 

Exceptions: 

None 
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AND And AND 



31 26 


25 


21 


20 16 


15 


11 


10 6 


5 0 




SPECIAL 
000000 


rs 


rt 


rd 


0 

0 0 00 0 


AND 
10 0 10 0 




6 




5 


5 


5 




5 


6 



Format: 

AND rd, rs, rt 

Description: 

The contents of general register rs are combined with the contents of 
general register rt in a bit-wise logical AND operation. The residt is placed 
into general register rd. 

Operation: 

32 T: GPR[rd] <- GPR[rs] and GPR[rt] 
64 T: GPR[rd] <- GPR[rs] and GPR[rt] 

Exceptions: 

None 
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ANDI And Immediate ANDI 



31 26 


25 21 


20 16 


15 




0 




ANDI 
0 0 110 0 


rs 


rt 


immediate 




6 


5 


5 




16 





Format: 

ANDI rt, rs, iiranediate 

Description: 

The 16-bit immediate is zero-extended and combined with the contents of 
general register rs in a bit-wise logical AND operation. The residt is placed 
into general register rt. 

Operation: 

32 T: GPR[rt] ^-O""^ II (immediate and GPR[rs]i5 o) 
64 T: GPR[rt] ^-O"*^ II (immediate and GPR[rs]i5...o) 



Exceptions: 

None 
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Branch On Coprocessor z False BCzF 



31 26 


25 21 


20 16 15 




0 




COPz 
0 1 0 0 X X* 


BC 
0 10 0 0 


BCF 
0 00 0 0 


offset 




6 


5 


5 




16 





Format: 

BCzF offset 

Description: 

A branch target address is computed from the sirai of the address of the 
instruction in the delay slot and the 16-bit ojfset, shifted left two bits and 
sign-extended. If coprocessor z's condition signal (CpCond), as sampled 
during the previous instruction, is false, then the program branches to the 
target address with a delay of one instruction. 

Because the condition line is sampled during the previous instruction, 

there must be at least one instruction between this instruction and a 
coprocessor instruction that changes the condition line. 

Operation: 



32 T-1 : 


condition <- not COC[z] 


T: 


target (offsetis)^"^ || offset || 0^ 


T+1: 


if condition then 




PC PC + target 




endif 


64 T-1 : 


condition <- not COC[z] 


T: 


target <- (offsetis)'*^ || offset || 0^ 


T+1: 


if condition then 




PC <- PC + target 




endif 



*See the table "Opcode Bit Encoding" on next page, or "CPU Instruction 
Opcode Bit Encoding" at the end of Appendix A. 
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D^^^P Branch On Coprocessor z False Dr^^P 
D^^r (continued) DO^r 



Exceptions: 

Coprocessor unusable exception 

Opcode Bit Encoding: 



BCzFBit# 

BCOF 


31 


30 


29 


28 


27 


26 


25 


24 


23 


22 


21 


20 1£ 


18 


17 


16 


0 


0 


1 


0 


0 


0 


0 


0 


1 


0 


0 


0 


0 


0 


0 


0 


0 




Bit# 


31 


30 


29 


28 


27 


26 


25 


24 


23 


22 


21 


20 IS 


18 


17 


16 


0 


BC1F 


0 


1 


0 


0 


0 


1 


0 


1 


0 


0 


0 


0 


0 


0 


0 


0 




Bit# 


31 


30 


29 


28 


27 


26 


25 


24 


23 


22 


21 


20 


19 


18 


17 


16 


0 


BC2F 


0 


1 


0 


0 


1 


0 


0 


1 


0 


0 


0 


0 


0 


0 


0 


0 





Opcode 
Coprocessor Unit Number- 



BC sub-opcode Branch condition 
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BCzFL 



Branch On Coprocessor z 
False Likely 



BCzFL 



31 26 


25 21 


20 16 15 




0 




COPz 
0 1 0 0 X X* 


BC 
0 1000 


BCFL 
0 0 0 1 0 


offset 




6 


5 


5 




16 





Format: 

BCzFL offset 

Description: 

A branch target address is computed from the sirai of the address of the 
instruction in the delay slot and the 16-bit ojfset, shifted left two bits and 
sign-extended. If the contents of coprocessor z's condition line, as 
sampled during the previous instruction, is false, the target address is 
branched to with a delay of one instruction. 

If the conditional branch is not taken, the instruction in the branch delay 
slot is nullified. 

Because the condition line is sampled during the previous instruction, 
there must be at least one instruction between this instruction and a 
coprocessor instruction that changes the condition line. 



*See the table "Opcode Bit Encoding" on next page, or "CPU Instruction 
Opcode Bit Encoding" at the end of Appendix A. 
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BCzFL 



Branch On Coprocessor z 
False Likely 
(continued) 



BCzFL 



Operation: 



32 T-1 : condition <r- not COC[z] 

T: target ^ (offsetig)^'' || offset || 0^ 
T+1 : if condition then 

PC <- PC + target 

else 

NullifyCurrentlnstruction 

endif 

64 T-1 : condition ^ not GOG[z] 

T: target ^ (offseti s)"*^ 1 1 offset 1 1 0^ 
T+1 : if condition then 

PC ^ PC + target 

else 

NullifyCurrentlnstruction 

endif 



Exceptions: 

Coprocessor unusable exception 

Opcode Bit Encoding: 



BCzFL Bit# 


31 30 29 28 


27 


26 25 24 


23 22 


21 


20 19 18 


17 16 


0 


BCOFL 


0 


1 


0 


0 


0 


0 


0 


1 


0 


0 


0 


0 


0 


0 


1 


0 






Bit# 


31 30 29 28 


27 


26 25 24 


23 22 


21 


20 19 18 


17 16 


0 


BC1FL 


0 


1 


0 


0 


0 


1 


0 


1 


0 


0 


0 


0 


0 


0 


1 


0 






Bit# 


31 


30 


29 28 


27 


26 


25 24 


23 22 


21 


20 19 18 


17 16 


0 


BC2FL 


0 


1 


0 


0 


1 


0 


0 


1 


0 


0 


0 


0 


0 


0 


1 


0 




































J 


V 
Opcode 

Coprocessor Unit Number 




BC sub-opcode 


Branch condition 
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BCzT Branch On Coprocessor z True BCzT 



31 26 25 21 20 16 15 



COPz 


BC 


BCT 


offset 


0 1 0 0 X X* 


0 1000 


0 0 00 1 




6 


5 


5 


16 



Format: 

BCzT offset 

Description: 

A branch target address is computed from the sum of the address of the 
instruction in the delay slot and the 16-bit ojfset, shifted left two bits and 
sign-extended. If the coprocessor z's condition signal (CpCond) is true, 
then the program branches to the target address, with a delay of one 
instruction. 

Because the condition line is sampled during the previous instruction, 
there must be at least one instruction between this instruction and a 
coprocessor instruction that changes the condition line. 

Operation: 



32 T-1 


condition <- COC[z] 


T: 


target ^ (offsetis)'''^ || offset || 0^ 


T+1 


if condition then 




PC <- PC + target 




endif 


64 T-1 : 


condition <r- COC[z] 


T: 


target <r- (offsetis)'^^ || offset || 0^ 


T+1: 


if condition then 




PC <- PC + target 




endif 



*See the table "Opcode Bit Encoding" on next page, or "CPU Instruction 
Opcode Bit Encoding" at the end of Appendix A. 
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D^^T Branch On Coprocessor z True Qr^^T 
■ (continued) DO^ I 



Exceptions: 

Coprocessor unusable exception 

Opcode Bit Encoding: 



BCzT 

BCOT 


31 30 


29 28 


27 


26 25 24 


23 22 


21 


20 19 18 


17 16 


0 


0 


1 


0 


0 


0 


0 


0 


1 


0 


0 


0 


0 


0 


0 


0 


1 






Bit# 


31 


30 29 28 27 26 25 24 23 22 21 


20 


19 


18 


17 


16 


0 


BC1T 


0 


1 


0 


0 


0 


1 


0 


1 


0 


0 


0 


0 


0 


0 


0 


1 






Bit# 


31 30 29 28 


27 


26 25 24 


23 22 


21 


20 19 18 


17 16 


0 


BC2T 


0 


1 


0 


0 


1 


0 


0 


1 


0 


0 


0 


0 


0 


0 


0 


1 








































V 
Opcode 








V 

BC sub-opcode 


V 

Branch condition 




Coprocessor Unit Number 
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BCzTL ^'^"'=\°;:e'^LTeir^°'' BCzTL 



31 26 


25 21 


20 16 15 




0 




COPz 
0 1 0 0 X X* 


BC 
0 1000 


BCTL 
0 0 0 1 1 


offset 




6 


5 


5 




16 





Format: 

BCzTL offset 

Description: 

A branch target address is computed from the sum of the address of the 
instruction in the delay slot and the 16-bit ojfset, shifted left two bits and 
sign-extended. If the contents of coprocessor z's condition line, as 
sampled during the previous instruction, is true, the target address is 
branched to with a delay of one instruction. 

If the conditional branch is not taken, the instruction in the branch delay 
slot is nullified. 

Because the condition line is sampled during the previous instruction, 
there must be at least one instruction between this instruction and a 
coprocessor instruction that changes the condition line. 

Operation: 



32 T-1: condition <- COC[z] 

T: target ^ (offsetis)^'* || offset || 0^ 
T+1 : if condition then 

gigg PC <- PC + target 

NullifyCurrentlnstruction 

endif 

condition <- COC[z] 
T: target ^ (offseti5)'^^|| offset || 0" 
T+1 : if condition tinen 

PC PC + target 

else 

NullifyCurrentlnstruction 

endif 



64 T-1 : 



^2 



*See the table "Opcode Bit Encoding" on next page, or "CPU Instruction 
Opcode Bit Encoding" at the end of Appendix A. 
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BCzTL 



Branch On Coprocessor z 
True Likely 
(continued) 



BCzTL 



Exceptions: 

Coprocessor unusable exception 

Opcode Bit Encoding: 



FL Bit# 


31 


30 


29 28 


27 


26 25 24 


23 22 


21 


20 19 18 


17 16 


0 


BCOTL 


0 


1 


0 


0 


0 


0 


0 


1 


0 


0 


0 


0 


0 


0 


1 


1 




Bit# 


31 30 29 28 


27 


26 25 24 


23 


22 


21 


2C 


19 18 


17 16 


0 


BC1TL 


0 


1 


0 


0 


0 


1 


0 


1 


0 


0 


0 


0 


0 


0 


1 


1 




Bit# 


31 30 29 28 


27 


26 25 24 


23 22 


21 


20 19 18 


17 


16 


0 


BC2TL 


0 


1 


0 


0 


1 


0 


0 


1 


0 


0 


0 


0 


0 


0 


1 


1 





Opcode 
Coprocessor Unit Number 



BC sub-opcode Branch condition 
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BEQ Branch On Equal BEQ 



31 26 


25 21 


20 16 


15 




0 




BEQ 
000 100 


rs 


rt 


offset 




6 


5 


5 




16 





Format: 

BEQ rs, rt, offset 

Description: 

A branch target address is computed from the sum of the address of the 
instruction in the delay slot and the 16-bit ojfset, shifted left two bits and 
sign-extended. The contents of general register rs and the contents of 
general register rt are compared. If the two registers are equal, then the 
program branches to the target address, with a delay of one instruction. 



Operation: 



32 T: 


target <r- (offsetis)^'^ || offset || 0^ 




condition <- (GPR[rs] = GPR[rt]) 


T+1: 


if condition then 




PC PC + target 




endif 


64 T: 


target ^ (offseti 5)"*^ || offset || 0^ 




condition <- (GPR[rs] = GPR[rt]) 


T+1: 


if condition then 




PC PC + target 




endif 



Exceptions: 

None 
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BEQL Branch On Equal Likely BEQL 



31 26 


25 21 


20 16 


15 




0 




BEQL 
0 10 10 0 


rs 


rt 


offset 




6 


5 


5 




16 





Format: 

BEQL rs, rt, offset 

Description: 

A branch target address is computed from the sum of the address of the 
instruction in the delay slot and the 16-bit offset, shifted left two bits and 
sign-extended. The contents of general register rs and the contents of 
general register rt are compared. If the two registers are equal, the target 
address is branched to, with a delay of one instruction. If the conditional 
branch is not taken, the instruction in the branch delay slot is nullified. 



Operation: 



32 T: 


target <- (offsetis)^"^ || offset || 0^ 




condition ^ (GPR[rs] = GPR[rt]) 


T+1 


if condition tinen 




PC <- PC + target 
else ^ 




NullifyCurrentlnstruction 




endif 


64 T: 


target <r- (offsetis)'^^ || offset || 0^ 




condition <- (GPR[rs] = GPR[rt]) 


T+1: 


if condition then 




PC <- PC + target 




else 




NullifyCurrentlnstruction 




endif 



Exceptions: 

None 
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Branch On Greater Than 
Or Equal To Zero 



31 26 


25 


21 


20 16 


15 




0 




REGIMM 
00000 1 


rs 


BGEZ 
00 0 0 1 


offset 




6 




5 


5 




16 





Format: 

BGEZ rs, offset 

Description: 

A branch target address is computed from the sum of the address of the 
instruction in the delay slot and the 16-bit ojfset, shifted left two bits and 
sign-extended. If the contents of general register rs have the sign bit 
cleared, then the program branches to the target address, with a delay of 
one instruction. 

Operation: 



32 T: 


target <- (offsetig)'''* || offset || 0^ 




condition <- (GPR[rs]3i = 0) 


T+1 


if condition then 




PC <- PC + target 




endif 


64 T: 


target <- (offsetig)"^^ || offset || 0^ 




condition <- (GPR[rs]63 = 0) 


T+1: 


if condition then 




PC <- PC + target 




endif 



Exceptions: 

None 
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r% ^ C 7 A I Branch On Greater Than r% ^ C 7 A I 
DVaC^ML Or Equal To Zero And Link DV^C^ML 



31 26 


25 


21 


20 16 


15 




0 




REGIMM 
00000 1 


rs 


BGEZAL 
1 0 00 1 


offset 




6 




5 


5 




16 





Format: 

BGEZAL rs, offset 

Description: 

A branch target address is computed from the sum of the address of the 
instruction in the delay slot and the 16-bit ojfset, shifted left two bits and 
sign-extended. Unconditionally, the address of the instruction after the 
delay slot is placed in the link register, r31. If the contents of general 
register rs have the sign bit cleared, then the program branches to the 
target address, with a delay of one instruction. 

General register rs may not be general register 31, because such an 
instruction is not restartable. An attempt to execute this instruction is not 
trapped, however. 

Operation: 



32 T: target <r- (offsetis)^"^ || offset 
condition ^ (GPR[rs]3i = 0) 
GPR[31]^ PG + 8 
T+1 : if condition then 

PC <- PC + target 

endif 

64 T: target <- (offsetis)'*^ || offset 
condition <- (GPR[rs]63 = 0) 
GPR[31]^PC + 8 
T+1 : if condition then 

PC <- PC + target 

endif 



Exceptions: 

None 
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Format: 

BGEZALL rs, offset 
Description: 

A branch target address is computed from the sum of the address of the 
instruction in the delay slot and the 16-bit offset, shifted left two bits and 
sign-extended. Unconditionally, the address of the instruction after the 
delay slot is placed in the link register, r31. If the contents of general 
register rs have the sign bit cleared, then the program branches to the 
target address, with a delay of one instruction. General register rs may not 
be general register 31, because such an instruction is not restartable. An 
attempt to execute this instruction is not trapped, however. If the 
conditional branch is not taken, the instruction in the branch delay slot is 
nullified. 

Operation: 



32 T: 



64 T: 



target <- (offset-15)^'^ || offset | 
condition <- (GPR[rs]3i = 0) 
GPR[31] ^ PC + 8 
T+1 : if condition then 

PC ^ PC + target 

else 

NullifyCurrentlnstruction 
endif 

target <- (offsetis)"^^ || offset | 
condition <- (GPR[rs]63 = 0) 
GPR[31] ^ PC + 8 
T+1 : if condition then 

PC ^ PC + target 

else 

NullifyCurrentlnstruction 

endif 



0^ 



Exceptions: 

None 
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Format: 

BGEZL rs, offset 

Description: 

A branch target address is computed from the swoci of the address of the 
instruction in the delay slot and the 16-bit ojfset, shifted left two bits and 
sign-extended. If the contents of general register rs have the sign bit 
cleared, then the program branches to the target address, with a delay of 
one instruction. If the conditional branch is not taken, the instruction in 
the branch delay slot is nullified. 

Operation: 



32 T: 



T+1: 



target <- (offsetig)^'* || offset || 0^ 
condition <- (GPR[rs]3i = 0) 
if condition then 

PC <- PC + target 



else 



NullifyCurrentlnstruction 



64 T: 



T+1 



endif 

target <- (offset-, 5)"^^ || offset || 0^ 

condition <- (GPR[rs]63 = 0) 
if condition then 

PC <- PC + target 

else 

NullifyCurrentlnstruction 

endif 



Exceptions: 

None 
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Format: 

BGTZ rs, offset 

Description: 

A branch target address is computed from the sirai of the address of the 
instruction in the delay slot and the 16-bit ojfset, shifted left two bits and 
sign-extended. The contents of general register rs are compared to zero. If 
the contents of general register rs have the sign bit cleared and are not 
equal to zero, then the program branches to the target address, with a 
delay of one instruction. 



Operation: 



32 T: 


target ^ (offsetis)^"^ || offset || 0^ 




condition <- (GPR[rs]3i = 0) and (GPR[rs] ^ 0^^) 


T+1: 


if condition then 




PC <- PC + target 




endif 


64 T: 


target <r- (offsetis)'^^ || offset || 0^ 




condition <- (GPR[rs]63 = 0) and (GPR[rs] ^ 0^'^) 


T+1: 


if condition then 




PC <- PC + target 




endif 



Exceptions: 

None 
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Format: 

BGTZL rs, offset 

Description: 

A branch target address is computed from the sum of the address of the 
instruction in the delay slot and the 16-bit ojfset, shifted left two bits and 
sign-extended. The contents of general register rs are compared to zero. If 
the contents of general register rs have the sign bit cleared and are not 
equal to zero, then the program branches to the target address, with a 
delay of one instruction. If the conditional branch is not taken, the 
instruction in the branch delay slot is nullified. 

Operation: 



32 T: 


target ^ (offsetis)^'^ || offset || 0^ 




condition ^ (GPR[rs]3i = 0) and (GPR[rs] ^ 0^^) 


T+1: 


if condition then 




PC <- PC + target 
else ^ 




NullifyCurrentlnstruction 




endif 


64 T: 


target <- (offsetis)"^^ || offset || 0^ 




condition <- (GPR[rs]63 = 0) and (GPR[rs] ^ 0^'^) 


T+1 


if condition then 




PC <- PC + target 




else 




NullifyCurrentlnstruction 




endif 



Exceptions: 

None 
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Format: 

BLEZ rs, offset 

Description: 

A branch target address is computed from the sum of the address of the 
instruction in the delay slot and the 16-bit ojfset, shifted left two bits and 
sign-extended. The contents of general register rs are compared to zero. If 
the contents of general register rs have the sign bit set, or are equal to zero, 
then the program branches to the target address, with a delay of one 
instruction. 

Operation: 



32 T: 



64 



target <- (offsetig)^'^ || offset || 0^ 
condition ^ (GPR[rs]3i = 1) or (GPR[rs] = 0^^) 
T+1 : if condition then 

PC <- PC + target 

endif 

T: target <r- (offset-, 5)"^^ || offset || 0^ 

condition ^ (GPR[rs]63 = 1) or (GPR[rs] = 0^"^) 
T+1 : if condition then 

PC <- PC + target 

endif 



Exceptions: 

None 
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Format: 

BLEZL rs, offset 

Description: 

A branch target address is computed from the sirai of the address of the 
instruction in the delay slot and the 16-bit ojfset, shifted left two bits and 
sign-extended. The contents of general register rs is compared to zero. If 
the contents of general register rs have the sign bit set, or are equal to zero, 
then the program branches to the target address, with a delay of one 
instruction. 

If the conditional branch is not taken, the instruction in the branch delay 
slot is nullified. 



Operation: 



32 



64 



T+1 

T: 
T+1 



target ^ (offsetis)'''^ || offset || 0^ 

condition (GPR[rs]3i = 1) or (GPR[rs] = O^^) 

if condition then 

PC <- PC + target 

else 

NullifyCurrentlnstruction 

endif 

target <r- (o^^se^sf^ \\ offset || 0^ 

condition ^ (GPRfrsJes = 1) or (GPR[rs] : 
if condition then 

PC <- PC + target 

else 

NullifyCurrentlnstruction 

endif 



■ 0^) 



Exceptions: 

None 
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Format: 

BLTZ rs, offset 

Description: 

A branch target address is computed from the srnn of the address of the 
instruction in the delay slot and the 16-bit ojfset, shifted left two bits and 
sign-extended. If the contents of general register rs have the sign bit set, 
then the program branches to the target address, with a delay of one 
instruction. 



Operation: 



32 T: 


target <r- (offsetis)^"^ || offset || 0^ 




condition (GPR[rs]3i = 1) 


T+1 


if condition then 




PC <- PC + target 




endif 


64 T: 


target ^ (offsetig)"^^ || offset || 0^ 




condition (GPR[rs]63 = 1) 


T+1: 


if condition then 




PC <- PC + target 




endif 



Exceptions: 

None 
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Format: 

BLTZAL rs, offset 

Description: 

A branch target address is computed from the smn of the address of the 
instruction in the delay slot and the 16-bit ojfset, shifted left two bits and 
sign-extended. Unconditionally, the address of the instruction after the 
delay slot is placed in the link register, r31. If the contents of general 
register rs have the sign bit set, then the program branches to the target 
address, with a delay of one instruction. 

General register rs may not be general register 31, because such an 
instruction is not restartable. An attempt to execute this instruction with 
register 31 specified as rs is not trapped, however. 



Operation: 



32 T: 


target <- (offsetis)^'^ || offset || 0^ 




condition <- (GPR[rs]3i = 1) 




GPR[31]^PC + 8 


T+1: 


if condition then 




PC <- PC + target 




endif 


64 T: 


target ^ (off seti 5)"^^ || offset || 0^ 




condition (GPR[rs]63= 1) 




GPR[31]<-PC + 8 


T+1: 


if condition then 




PC <- PC + target 




endif 



Exceptions: 

None 
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Format: 

BLTZALL rs, offset 
Description: 

A branch target address is computed from the srnn of the address of the 
instruction in the delay slot and the 16-bit ojfset, shifted left two bits and 
sign-extended. Unconditionally, the address of the instruction after the 
delay slot is placed in the link register, r31. If the contents of general 
register rs have the sign bit set, then the program branches to the target 
address, with a delay of one instruction. 

General register rs may not be general register 31, because such an 
instruction is not restartable. An attempt to execute this instruction with 
register 31 specified as rs is not trapped, however. If the conditional 
branch is not taken, the instruction in the branch delay slot is nullified. 

Operation: 



32 T: 



64 T: 



target <- (offsetis)'"'^ || offset | 
condition ^ (GPR[rs]3i = 1) 
GPR[31] ^ PC + 8 
T+1 : if condition then 

PC 4- PC + target 

else 

NullifyCurrentlnstruction 

endif 

target ^ (offsetis)'^^ || offset || 
condition ^ (GPR[rs]63 = 1) 
GPR[31] ^ PC + 8 
T+1 : if condition then 

PC ^ PC + target 

else 

NullifyCurrentlnstruction 

endif 



0^ 



Exceptions: 

None 
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Format: 

BLTZ rs, offset 

Description: 

A branch target address is computed from the svxn of the address of the 
instruction in the delay slot and the 16-bit ojfset, shifted left two bits and 
sign-extended. If the contents of general register rs have the sign bit set, 
then the program branches to the target address, with a delay of one 
instruction. If the conditional branch is not taken, the instruction in the 
branch delay slot is nullified. 



Operation: 



32 T: 


target <- (offsetis)^'^ || offset || 0^ 




condition ^ (GPR[rs]3i = 1) 


T+1: 


if condition tinen 




PC <- PC + target 




else 




NullifyCurrentlnstruction 




endif 


64 T: 


target (offsetis)'^^ || offset || 0^ 




condition <- (GPRfrsJes = 1 ) 


T+1: 


if condition then 




PC <- PC + target 




else 




NullifyCurrentlnstruction 




endif 



Exceptions: 

None 
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Format: 

BNE rs, rt, offset 

Description: 

A branch target address is computed from the swn of the address of the 
instruction in the delay slot and the 16-bit ojfset, shifted left two bits and 
sign-extended. The contents of general register rs and the contents of 
general register rt are compared. If the two registers are not equal, then 
the program branches to the target address, with a delay of one 
instruction. 

Operation: 



32 T: 


target (offsetis)^"^ || offset || 0^ 




condition <- (GPR[rs] ^ GPR[rt]) 


T+1: 


if condition then 




PC <- PC + target 




endif 


64 T: 


target <- (offsetis)'^^ || offset || 0^ 




condition <- (GPR[rs] GPR[rt]) 


T+1: 


if condition tlien 




PC PC + target 




endif 



Exceptions: 

None 
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Format: 

BNEL rs, rt, offset 

Description: 

A branch target address is computed from the srnn of the address of the 
instruction in the delay slot and the 16-bit ojfset, shifted left two bits and 
sign-extended. The contents of general register rs and the contents of 
general register rt are compared. If the two registers are not equal, then 
the program branches to the target address, with a delay of one 
instruction. 

If the conditional branch is not taken, the instruction in the branch delay 
slot is nullified. 



Operation: 



32 T: 


target ^ (offsetis)'''* || offset || 0^ 




condition <- (GPR[rs] ^ GPR[rt]) 


T+1: 


if condition then 




PC <- PC + target 




else 




NullifyCurrentlnstruction 




endif 


64 T: 


target <- (offsetis)'^^ || offset || 0^ 




condition <- (GPR[rs] ^ GPR[rt]) 


T+1: 


if condition then 




PC ^ PC + target 




else 




Nu 1 lif yCu rrent 1 nstructio n 




endif 



Exceptions: 

None 



A-40 



MIPS R4000 Microprocessor User's Manual 



CPU Instruction Set Details 



BREAK Breakpoint BREAK 



31 


26 


25 




65 


0 




SPECIAL 
000000 


code 


BREAK 
0 0 110 1 






6 




20 




6 





Format: 

BREAK 

Description: 

A breakpoint trap occurs, immediately and unconditionally transferring 
control to the exception handler. 

The code field is available for use as software parameters, but is retrieved 
by the exception handler only by loading the contents of the memory word 
containing the instruction. 

Operation: 

32, 64 T: BreakpointException 



Exceptions: 

Breakpoint exception 
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Format: 

CACHE op, offset(base) 

Description: 

The 16-bit ojfset is sign-extended and added to the contents of general 
register base to form a virtual address. The virtual address is translated to 
a physical address using the TLB, and the 5-bit sub-opcode specifies a 
cache operation for that address. 

If CPO is not usable (User or Supervisor mode) the CPO enable bit in the 
Status register is clear, and a coprocessor imusable exception is taken. The 
operation of this instruction on any operation/ cache combination not 
listed below, or on a secondary cache when none is present, is undefined. 
The operation of this instruction on uncached addresses is also undefined. 

The Index operation uses part of the virtual address to specify a cache 
block. 

For a primary cache of 2CACHEBITS ^^^^^^ ^j^^ 2LINEBITS ^^^^^^ ^^j. ^^^^ 
vAddrcACHEBrrs ... linebits specifies the block. 

For a secondary cache of 2CACHEBITS ^y^gg ^j^j^ 2LINEBITS ^^^^^ pgj. ^^^^ 
pAddrcACHEBlTS ... LINEBITS specifies the block. 

Index Load Tag also uses vAddrLj]sjEBiTS... 3 to select the doubleword for 
reading FCC or parity. When the CE bit of the Status register is set. Hit 
WriteBack, Hit WriteBack Invalidate, Index WriteBack Invalidate, and Fill 

also use vAddrLj^EBlTS 3 to select the doubleword that has its FCC or 
parity modified. This operation is performed unconditionally. 

The Hit operation accesses the specified cache as normal data references, 
and performs the specified operation if the cache block contains valid data 
with the specified physical address (a hit). If the cache block is invalid or 
contains a different address (a miss), no operation is performed. 
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Cache 
(continued) 



CACHE 



Write back from a primary cache goes to the secondary cache (if there is 
one), otherwise to memory. Write back from a secondary cache always 
goes to memory. A secondary write back always writes the most recent 
data; the data comes from the primary data cache, if present, and modified 
(the Wbit is set). Otherwise the data comes from the specified secondary 
cache. The address to be written is specified by the cache tag and not the 
translated physical address. 

TLB Refill and TLB Invalid exceptions can occur on any operation. For 
Index operations (where the physical address is used to index the cache 
but need not match the cache tag) unmapped addresses may be used to 
avoid TLB exceptions. This operation never causes TLB Modified or 
Virtual Coherency exceptions. 

Bits 17.. .16 of the instruction specify the cache as follows: 



Code 


Name 


Cache 


0 


I 


primary instruction 


1 


D 


primary data 


2 


SI 


secondary instruction 


3 


SD 


secondary data (or combined instruction/ data) 
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Bits 20...18 (this value is listed under the Code column) of the instruction 
specify the operation as follows: 



Code 


Caches 


Name 


Operation 


0 


1, SI 


Index 
Invalidate 


Set tlie caclie state of tfie cache block to Invalid. 


0 


D 


Index 

Writeback 

Invalidate 


Examine the cache state and Writeback bit {Wb\t) of the primary data 
cache block at the index specified by the virtual address. If the state is 
not Invalid and the Wbit is set, write the block back to the secondary 
cache (if present) or to memory (if no secondary cache). The address to 
write is taken from the primary cache tag. When a secondary cache is 
present, and the CE bit of the Status register is set, the contents of the 
ECC register is XOR'd into the computed check bits during the write to 
the secondary cache for the addressed doubleword. Set the cache state 
of primary cache block to Invalid. The WbW. is unchanged (and irrelevant 
because the state is Invalid). 


0 


SD 


Index 

Writeback 

Invalidate 


Examine the cache state of the secondary data cache block at the index 
specified by the physical address. If the block is dirty (the state is Dirty 
Exclusive or Dirty Shared), write the data back to memory. Like all 
secondary writebacks, the operation writes any modified data for the 
addresses from the primary data cache. The address to write is taken 
from the secondary cache tag. The PWx field of the secondary tag is 
used to determine the locations in the primaries to check for matching 
primary blocks. In all cases, set the state of the secondary cache block 
and all matching phmary subblocks to Invalid. No Invalidate is sent on 
the R4000's system interface. 


1 


All 


Index Load 
Tag 


Read the tag for the cache block at the specified index and place it iinto 
the TagLo and TagHi CPO registers, ignohng any ECC or parity errors. 
Also load the data ECC or parity bits into the ECC register. 


2 


All 


Index Store 
Tag 


Write the tag for the cache block at the specified index from the TagLo 
and TagHi CPO registers. The processor uses computed parity for the 
primary caches and the TagLo register in the case of the secondary 
cache. 
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Code 


Caches 


Name 


Operation 


3 


SD 


Create Dirty 
Exclusive 


Tills operation is used to avoid loading data needlessly from memory 
whien writing new contents into an entire cachie block. If thie cache block 
is valid but does not contain the specified address (a valid miss) the 
secondary block is vacated. The data Is written back to memory If dirty 
and all matching blocks In both primary caches are invalidated. As usual 
during a secondary writeback, If the primary data cache contains 

mnrlifipri riata ^matphinn hlorkc; with M/hit QPt\ that mnrlifipri data ic; 

iiivjuiiiC'U vjdLci ^1 1 lui^i 1 II 1^ L/ivjorxo Willi V V uii ii lai iiivjvjiiicvj uaiu lo 

written to memory. If the cache block is valid and contains the specified 
physical address (a hit), the operation cleans up the primary caches to 
avoid virtual aliases: all blocks in both primary caches that match the 
secondary line are invalidated without writeback. Note that the search for 
matching primary blocks uses the virtual index of the PWx field of the 
secondary cache tag (the virtual Index when the location was last used) 
and not the virtual Index of the virtual address used In the operation (the 
virtual Index where the location will now be used). If the secondary tag 
and address do not match (miss), or the tag and address do match (hit) 
and the block is in a shared state, an invalidate for the specified address 
is sent over the System interface. In all cases, the cache block tag must 
be set to the specified physical address, the cache state must be set to 
Dirty Exclusive, and the virtual index field set from the virtual address. 
The CH bit in the Status register is set or cleared to indicate a hit or miss 


3 


D 


Create Dirty 
Exclusive 


This operation is used to avoid loading data needlessly from secondary 
cache or memory when writing new contents into an entire cache block. 
If the cache block does not contain the specified address, and the block 
is dirty, write it back to the secondary cache (if present) or othenwise to 
memory. In all cases, set the cache block tag to the specified physical 
address, set the cache state to Dirty Exclusive. 


4 


l,D 


Hit Invalidate 


If the cache block contains the specified address, mark the cache block 
invalid. 


4 


SI, SD 


Hit Invalidate 


If the cache block contains the specified address, mark the cache block 
invalid and also invalidate all matching blocks, if present, in the primary 
caches (the PWx field of the secondary tag is used to determine the 
locations in the primaries to search). The CH bit In the Status register Is 
set or cleared to indicate a hit or miss. 


5 


D 


Hit Writebacl^. 
Invalidate 


If the cache block contains the specified address, write the data back If it 

is dirty, and mark the cache block invalid. When a secondary cache is 
present, and the CE bit of the Status register is set, the contents of the 
ECC register is XOR'd into the computed check bits during the write to 
the secondary cache for the addressed doubleword. 
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Code 


Caches 


Name 


Operation 


5 


SD 


Hit Writebacl< 
Invaiidate 


If ttie cactie block contains the specified address, write back tlie data (if 
dirty), and mark the secondary cache block and all matching blocks in 
both primary caches invalid. As usual with secondary writebacks, 
modified data in the primary data cache (matching block with the Wb\t 
set) is used during the writeback. The PWx field of the secondary tag is 
used to determine the locations in the primaries to check for matching 
primary blocks. The CH bit in the Status register is set or cleared to 
indicate a hit or miss. 


5 


1 


Fiii 


Fill the primary instruction cache blockf rom secondary cache or memory. 
If the CEbit of the Status register is set, the content of the ECC register 
is used instead of the computed parity bits for addressed doubleword 
when written to the instruction cache. For the R4000PC, the cache is 
filled from memory. For the R4000SC and R4000MC, the cache is filled 
from the secondary cache whether or not the secondary cache block is 
valid or contains the specified address. 


6 


D 


Hit Writebacl< 


If the cache block contains the specified address, and the Wb\l is set, 
write back the data. The WbW is not cleared; a subsequent miss to the 
block will write it back again. This second writeback is redundant, but not 
incorrect. When a secondary cache is present, and the CE bit of the 
Status register is set, the content of the ECC register is XOR'd into the 
computed check bits during the write to the secondary cache for the 
addressed doubleword. Note: The Wb\t is not cleared during this 
operation due to an artifact of the implementation; the Wb\t is 
implemented as part of the data side of the cache array so that it can be 
written during a data write. 


6 


SD 


Hit Writebacl< 


If the cache block contains the specified address, and the cache state is 
Dirty Exclusive or Dirty Shared, data is written back to memory. The 
cache state is unchanged; a subsequent miss to the block causes it to be 
written back again. This second writeback is redundant, but not 
incorrect. The CHbit in the Status register is set or cleared to indicate a 
hit or miss. The writeback looks in the primary data cache for modified 
data, but does not invalidate or clear the Writeback bit in the primary data 
cache. Note: The state of the secondary block is not changed to clean 
during this operation because the 1/1/ bit of matching sub-blocks cannot 
be cleared to put the primary block in a clean state. 


6 


1 


Hit Writebacl< 


If the cache block contains the specified address, data is written back 
unconditionally. When a secondary cache is present, and the CEbit of 
the Status register is set, the contents of the ECC register is XOR'd into 
the computed check bits during the write to the secondary cache for the 
addressed doubleword. 
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CACHE 



Cache 
(continued) 



CACHE 



Code 


Caches 


Name 


Operation 


7 


SI.SD 


Hit Set Virtual 


This operation is used to change the virtuai index of secondary cache 
contents, avoiding unnecessary memory operations. If the cache block 
contains the specified address, invalidate matching blocks in the primary 
caches at the index formed by concatenating PIdx in the secondary 
cache tag (not the virtual address of the operation) and vAddr^^ 4, and 
then set the virtual index field of the secondary cache tag from the 
specified virtual address. IVIodified data in the primary data cache is not 
preserved by the operation and should be explicitly written back before 
this operation. The CH bit in the Status register is set or cleared to 
indicate a hit or miss. 



Operation: 



32, 64 T: 



vAddr<- ((offsetis)"^^ || offseti5...o) + GPR[base] 
(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
CacheOp (op, vAddr, pAddr) 



Exceptions: 

Coprocessor unusable exception 
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CFCz 



Move Control From 
Coprocessor 



CFCz 



31 26 


25 21 


20 


16 


15 11 


10 




0 




COPz 
0 1 0 0 X X* 


CF 
0 0 0 1 0 


rt 


rd 


0 

0 0 00 0 




6 


5 


5 




5 




11 





Format: 

CFCz rt, rd 

Description: 

The contents of coprocessor control register rd of coprocessor unit z are 
loaded into general register rt. 

This instruction is not valid for CPO. 
Operation: 



32 


T: 


data<- CCR[z,rd] 




T+1: 


GPR[rt] <-data 


64 


T: 


data ^ (CCR[z,rd]3i)32 || CCR[z,rd] 




T+1: 


GPR[rt] <-data 



Exceptions: 

Coprocessor unusable exception 

*Opcode Bit Encoding: 



CFCz 



, Bit #31 


30 


29 


28 


27 


26 


25 


24 


23 


22 


21 


0 


"CFCI 


0 


1 


0 


0 


0 


1 


0 


0 


0 


1 


0 




Bit #31 


30 


29 


28 


27 


26 


25 


24 


23 


22 


21 


0 


CFC2 


0 


1 


0 


0 


1 


0 


0 


0 


0 


1 


0 





Opcode Coprocessor Suboperation 

Coprocessor Unit Number 
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COPz 



Coprocessor Operation 



COPz 



31 26 


25 


24 


0 




COPz 
0 1 0 0 X X* 


CO 
1 


cofun 




6 


1 




25 



Format: 

COPz cofxin 
Description: 

A coprocessor operation is performed. The operation may specify and 
reference internal coprocessor registers, and may change the state of the 
coprocessor condition line, but does not modify state within the processor 
or the cache/ memory system. Details of coprocessor operations are 
contained in Appendix B. 
Operation: 



32, 64 T: CoprocessorOperation (z, cofun) 



Exceptions: 

Coprocessor unusable exception 

Coprocessor interrupt or Floating-Point Exception (R4000 CPl only) 
*Opcode Bit Encoding: 



COPz Bit# 


31 


30 


29 


28 


27 


26 


25 


0 


COPO 


0 


1 


0 


0 


0 


0 


1 


Bit# 


31 


30 


29 


28 


27 


26 


25 


0 


C0P1 


0 


1 


0 


0 


0 


1 


1 


Bit# 


31 


30 


29 


28 


27 


26 


25 


0 


C0P2 


0 


1 


0 


0 


1 


0 


1 




CO sub-opcode (see end of Appendix A) 
Coprocessor Unit Number 
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Qy Q2 Move Control to Coprocessor CTCZ 



31 26 


25 21 


20 


16 


15 


11 


10 0 




COPz 
0 1 0 0 X X * 


CT 
0 0 110 


rt 


rd 


0 

0 0 0 0 0 0 0 0 00 0 




6 


5 


5 




5 




11 



Format: 

CTCz rt, rd 

Description: 

The contents of general register rt are loaded into control register rd of 
coprocessor unit z. 

This instruction is not valid for CPO. 
Operation: 



32,64 T: data <- GPR[rt] 

T + 1 : CCR[z,rd] <^ data 



Exceptions: 

Coprocessor unusable 
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DADD Doubieword Add DADD 



31 26 


25 


21 


20 16 


15 


11 


10 6 


5 0 




SPECIAL 


rs 


rt 


rd 


0 


DADD 






000000 


0 0 0 00 


10 110 0 




6 




5 


5 


5 




5 


6 



Format: 

DADD rd, rs, rt 

Description: 

The contents of general register rs and the contents of general register rt 
are added to form the result. The result is placed into general register rd. 

An overflow exception occurs if the carries out of bits 62 and 63 differ (2's 
complement overflow). The destination register rd is not modified when 
an integer overflow exception occurs. 

This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruction 
exception. 

Operation: 

64 T: GPR[rd] ^GPR[rs] + GPR[rt] 



Exceptions: 

Integer overflow exception 

Reserved instruction exception (R4000 in 32-bit mode) 
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DADDI Doubleword Add Immediate DADDI 



31 26 


25 21 


20 16 


15 




0 




DADDI 
0 110 0 0 


rs 


rt 


immediate 




6 


5 


5 




16 





Format: 

DADDI rt, rs, immediate 

Description: 

The 16-bit immediate is sign-extended and added to the contents of general 
register rs to form the result. The result is placed into general register rt. 

An overflow exception occurs if carries out of bits 62 and 63 differ (2's 
complement overflow). The destination register rt is not modified when 
an integer overflow exception occurs. 

This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruction 
exception. 

Operation: 

64 T: GPR [rt] GPR[rs] + (immediateis)"^^ I I immediatei5...o 



Exceptions: 

Integer overflow exception 

Reserved instruction exception (R4000 in 32-bit mode) 
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DADDIU ,mrd'K4'ned DADDIU 



31 26 


25 21 


20 16 


15 




0 




DADDIU 
0 110 0 1 


rs 


rt 


immediate 




6 


5 


5 




16 





Format: 

DADDIU rt, rs, immediate 

Description: 

The 16-bit immediate is sign-extended and added to the contents of general 
register rs to form the result. The result is placed into general register rt. 
No integer overflow exception occurs under any circumstances. 

The only difference between this instruction and the DADDI instruction is 
that DADDIU never causes an overflow exception. 

This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruction 
exception. 

Operation: 



64 T: GPR [rt] <- GPR[rs] + (immediatSis)'^^ || immediate-, 5 . 0 



Exceptions: 

Reserved instruction exception (R4000 in 32-bit mode) 



MIPS R4000 Microprocessor User's Manual 



A-53 



Appendix A 



DADDU Doubleword Add Unsigned DADDU 



31 26 


25 


21 


20 16 


15 


11 


10 6 


5 0 




SPECIAL 
000000 


rs 


rt 


rd 


0 

0 0 00 0 


DADDU 
10 110 1 




6 




5 


5 


5 




5 


6 



Format: 

DADDU rd, rs, rt 

Description: 

The contents of general register rs and the contents of general register rt 
are added to form the result. The residt is placed into general register rd. 

No overflow exception occurs under any circumstances. 

The only difference between this instruction and the DADD instruction is 
that DADDU never causes an overflow exception. 

This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruction 
exception. 

Operation: 

64 T: GPR[rd] <-GPR[rs] + GPR[rt] 



Exceptions: 

Reserved instruction exception (R4000 in 32-bit mode) 
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DDIV Doubleword Divide DDIV 



31 26 


25 


21 


20 16 


15 6 


5 0 




SPECIAL 




rs 


rt 


0 


DDIV 






000000 








0 0 00 0 0 0 0 0 0 


0 11110 




6 




5 


5 


10 


6 



Format: 

DDIV rs, rt 
Description: 

The contents of general register rs are divided by the contents of general 
register rt, treating both operands as 2's complement values. No overflow 
exception occurs under any circumstances, and the result of this operation 
is undefined when the divisor is zero. 

This instruction is typically followed by additional instructions to check 
for a zero divisor and for overflow. 

When the operation completes, the quotient word of the double result is 
loaded into special register LO, and the remainder word of the double 
result is loaded into special register HI. 

If either of the two preceding instructions is MFHI or MFLO, the results of 
those instructions are tindefined. Correct operation requires separating 
reads of HI or LO from writes by two or more instructions. 

This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruction 
exception. 

Operation: 



64 T-2: 


LO 


<r- undefined 




HI 


<- undefined 


T-1: 


LO 


<- undefined 




HI 


<- undefined 


T: 


LO 


^ GPR[rs] div GPR[rt] 




HI 


<- GPR[rs] mod GPR[rt] 



Exceptions: 

Reserved instruction exception (R4000 in 32-bit mode) 



MIPS R4000 Microprocessor User's Manual 



A-55 



Appendix A 



DDIVU Doubleword Divide Unsigned DDIVU 



31 26 


25 


21 


20 16 


15 6 


5 0 




SPECIAL 




rs 


rt 


0 


DDIVU 






000000 








00 0 0 00 0000 


0 11111 




6 




5 


5 


10 


6 



Format: 

DDIVU rs, rt 
Description: 

The contents of general register rs are divided by the contents of general 
register rt, treating both operands as unsigned values. No integer 
overflow exception occurs under any circumstances, and the result of this 
operation is undefined when the divisor is zero. 

This instruction is t)^ically followed by additional instructions to check 

for a zero divisor. 

When the operation completes, the quotient word of the double result is 
loaded into special register LO, and the remainder word of the double 
result is loaded into special register HI. 

If either of the two preceding instructions is MFHI or MFLO, the results of 
those instructions are undefined. Correct operation requires separating 
reads of HI or LO from writes by two or more instructions. 

This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruction 
exception. 
Operation: 



64 T-2: 


LO 


<- undefined 




HI 


<- undefined 


T-1: 


LO 


<- undefined 




HI 


<- undefined 


T: 


LO 


^ (0 II GPR[rs]) div (0 || GPR[rt]) 




HI 


^ (0 II GPR[rs]) mod (0 || GPR[rt]) 



Exceptions: 

Reserved instruction exception (R4000 in 32-bit mode) 
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DIV 




Divide 


DIV 




31 26 


25 21 


20 16 


15 6 


5 0 






SPECIAL 


rs 


rt 


0 


DIV 






000000 


0 0 0 00 0 0 0 0 0 


0 110 10 






6 


5 


5 


10 


6 





Format: 

DIV rs, rt 

Description: 

The contents of general register rs are divided by the contents of general 
register rt, treating both operands as 2's complement values. No overflow 
exception occiirs imder any circiraistances, and the residt of this operation 
is undefined when the divisor is zero. 

In 64-bit mode, the operands must be valid sign-extended, 32-bit values. 

This instruction is t5^ically followed by additional instructions to check 
for a zero divisor and for overflow. 

When the operation completes, the quotient word of the double result is 
loaded into special register LO, and the remainder word of the double 
result is loaded into special register HI. 

If either of the two preceding instructions is MFHI or MFLO, the results of 
those instructions are undefined. Correct operation requires separating 
reads of HI or LO from writes by two or more instructions. 
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DIV 



Divide 
(continued) 



DIV 



Operation: 



32 



64 



T-2: 


LO 


<r- undefined 




HI 


<- undefined 


T-1: 


LO 


<- undefined 




HI 


<r- undefined 


T: 


LO 


^ GPR[rs] div GPR[rt] 




HI 


<- GPR[rs] mod GPR[rt] 


T-2: 


LO 


<r- undefined 




HI 


^ undefined 


T-1: 


LO 


<- undefined 




HI 


<- undefined 


T: 


q 


^GPR[rs]3i...odivGPR[rt]3i...o 




r 


^GPR[rs]3i...o mod GPR[rt]3i...o 




LO 


^(q3i)^^llq3i...o 




HI 


^-C^ai) II '■31...0 



Exceptions: 

None 
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DIVU 




Divide Unsigned 


DIVU 




31 26 


25 21 


20 16 


15 6 


5 0 






SPECIAL 


rs 


rt 


0 


DIVU 






000000 


0 00 0 0 0 0 0 00 


0 110 11 






6 


5 


5 


10 


6 





Format: 

DIVU rs, rt 

Description: 

The contents of general register rs are divided by the contents of general 
register rt, treating both operands as unsigned values. No integer 
overflow exception occurs under any circumstances, and the result of this 
operation is undefined when the divisor is zero. 

In 64-bit mode, the operands must be valid sign-extended, 32-bit values. 

This instruction is t5rpically followed by additional instructions to check 
for a zero divisor. 

When the operation completes, the quotient word of the double result is 
loaded into special register LO, and the remainder word of the double 
result is loaded into special register HI. 

If either of the two preceding instructions is MFHI or MFLO, the results of 
those instructions are undefined. Correct operation requires separating 
reads of HI or LO from writes by two or more instructions. 
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DIVU 



Divide Unsigned 
(continued) 



DIVU 



Operation: 



32 



64 



T-2: 


LO 


<- 




HI 


<- 


T-1: 


LO 


<r- 




HI 


<- 


T: 


LO 


<- 




HI 


<— 


T-2: 


LO 


<r- 




HI 


<- 


T-1: 


LO 


<— 




HI 


<r- 


T: 


q 


<r- 




r 


<- 




LO 


<— 




HI 


<r- 



GPR[rs]) div (0 || GPR[rt]) 
GPR[rs]) mod (0 || GPR[rt]) 



GPR[rs]3i...o) div (0||GPR[rt]3i...o) 
GPR[rs]3i...o) mod (0 || GPR[rt]3i...o) 

l) l|C|31...0 

^32 II r 

) ll'31...0 



Exceptions: 

None 
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DMFCO System Control Coprocessor DMFCO 



31 26 


25 21 


20 


16 


15 


11 


10 0 




COPO 
0 10 0 0 0 


DMF 
0 0 00 1 


rt 


rd 


0 

000 0000 00 00 




6 


5 


5 




5 




11 



Format: 

DMFCO rt, rd 

Description: 

The contents of coprocessor register rd of the CPO are loaded into general 
register rt. 

This operation is defined for the R4000 operating in 64-bit mode and in 32- 
bit kernel mode. Execution of this instruction in 32-bit user or supervisor 
mode causes a reserved instruction exception. All 64-bits of the general 
register destination are written from the coprocessor register source. The 
operation of DMFCO on a 32-bit coprocessor 0 register is undefined. 



Operation: 



64 T: data<-CPR[0,rd] 

T+1: GPR[rt] <- data 



Exceptions: 

Coprocessor unusable exception 

Reserved instruction exception (R4000 in 32-bit user mode 

R4000 in 32-bit supervisor mode) 
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DMTCO System Control Coprocessor DMTCO 



31 26 


25 21 


20 


16 


15 


11 


10 0 




COPO 
0 10 0 0 0 


DMT 
0 0 10 1 


rt 


rd 


0 

000 0000 00 00 




6 


5 


5 




5 




11 



Format: 

DMTCO rt, rd 

Description: 



The contents of general register rt are loaded into coprocessor register rd 
of the CPO. 

This operation is defined for the R4000 operating in 64-bit mode or in 32- 
bit kernel mode. Execution of this instruction in 32-bit user or supervisor 
mode causes a reserved instruction exception. 

All 64-bits of the coprocessor 0 register are written from the general 
register source. The operation of DMTCO on a 32-bit coprocessor 0 register 
is undefined. 

Because the state of the virtual address translation system may be altered 
by this instruction, the operation of load instructions, store instructions, 
and TLB operations immediately prior to and after this instruction are 
undefined. 



Operation: 



64 T: data ^ GPR[rt] 

T+1: CPR[0,rd] ^ data 



Exceptions: 

Coprocessor unusable exception (R4000 in 32-bit user mode 

R4000 in 32-bit supervisor mode) 
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D M U LT Doubleword Multiply D M U LT 



31 26 


25 


21 


20 16 


15 6 


5 0 




SPECIAL 




rs 


rt 


0 


DMULT 






000000 








0 0 0 0 00 0 00 0 


0 1110 0 




6 




5 


5 


10 


6 



Format: 

DMULT rs, rt 

Description: 

The contents of general registers rs and rt are midtiplied, treating both 
operands as 2's complement values. No integer overflow exception occurs 
under any circumstances. 

When the operation completes, the low-order word of the double result is 
loaded into special register LO, and the high-order word of the double 
result is loaded into special register HI. 

If either of the two preceding instructions is MFHI or MFLO, the results of 
these instructions are undefined. Correct operation requires separating 
reads of HI or LO from writes by a minimum of two other instructions. 

This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruction 
exception. 

Operation: 



T-2: 


LO 


<- undefined 




HI 


<r- undefined 


T-1: 


LO 


undefined 




HI 


undefined 


T: 


t 


^ GPR[rs] * GPR[rt] 




LO 


^ t63...0 




H 1 


<-tl27...64 



Exceptions: 

Reserved instruction exception (R4000 in 32-bit mode) 
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DMULTU 



Doubleword Multiply 
Unsigned 



DMULTU 



31 



26 25 



21 20 



16 15 



6 5 



SPECIAL 


rs 


rt 


0 


DMULTU 


000000 






00 0 00 0 0 0 0 0 


0 1110 1 


6 


5 


5 


10 


6 



Format: 

DMULTU rs, rt 

Description: 

The contents of general register rs and the contents of general register rt 
are multiplied, treating both operands as unsigned values. No overflow 
exception occurs under any circumstances. 

When the operation completes, the low-order word of the double result is 
loaded into special register LO, and the high-order word of the double 
result is loaded into special register HI. 

If either of the two preceding instructions is MFHI or MFLO, the results of 
these instructions are undefined. Correct operation requires separating 
reads of HI or LO from writes by a minimum of two instructions. 

This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruction 
exception. 



Operation: 



64 T-2: 


LO <- undefined 




HI 4- undefined 


T-1: 


LO <r- undefined 




HI <- undefined 


T: 


t<-(0||GPR[rs])*(0||GPR[rt]) 




LO ^ t63...o 




HI <-ti27...64 



Exceptions: 

Reserved instruction exception (R4000 in 32-bit mode) 
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DSLL Doubleword Shift Left Logical DSLL 



31 26 


25 21 


20 


16 


15 


11 


10 6 


5 0 




SPECIAL 
000000 


0 

0 0 0 00 


rt 


rd 


sa 


DSLL 
111000 




6 


5 


5 




5 




5 


6 



Format: 

DSLL rd, rt, sa 

Description: 

The contents of general register rt are shifted left by sa bits, inserting zeros 
into the low-order bits. The result is placed in register rd. 

Operation: 

64 T: s^O||sa 

GPR[rd]^GPR[rt](63-s)...ollO' 

Exceptions: 

Reserved instruction exception (R4000 in 32-bit mode) 
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DSLLV 



Doubleword Shift Left 
Logical Variable 



DSLLV 



31 26 


25 


21 


20 16 


15 


11 


10 6 


5 0 




SPECIAL 
000000 


rs 


rt 


rd 


0 

0 0 00 0 


DSLLV 
0 10 10 0 




6 




5 


5 


5 




5 


6 



Format: 

DSLLV rd, rt, rs 

Description: 

The contents of general register rt are shifted left by the niimber of bits 

specified by the low-order six bits contained in general register rs, 
inserting zeros into the low-order bits. The result is placed in register rd. 

This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruction 
exception. 

Operation: 



64 T: s ^ GPR[rs]5.. 0 

GPR[rd]^GPR[rt](63-s)...ollO= 



Exceptions: 

Reserved instruction exception (R4000 in 32-bit mode) 
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DSLL32 



Doubleword Shift Left 
Logical + 32 



DSLL32 



31 26 


25 21 


20 


16 


15 


11 


10 6 


5 0 




SPECIAL 
000000 


0 

0 0 00 0 


rt 


rd 


sa 


DSLL32 
11110 0 




6 


5 


5 




5 




5 


6 



Format: 

DSLL32 rd, rt, sa 

Description: 

The coritents of general register rt are shifted left by 32+sa bits, inserting 
zeros into the low-order bits. The residt is placed in register rd. 

This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruction 
exception. 



Operation: 




Exceptions: 

Reserved instruction exception (R4000 in 32-bit mode) 
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DSRA 



Doubleword 
Shift Right Arithmetic 



DSRA 



31 26 


25 21 


20 


16 


15 


11 


10 6 


5 0 




SPECIAL 
000000 


0 

0 0 0 00 


rt 


rd 


sa 


DSRA 
1110 11 




6 


5 


5 




5 




5 


6 



Format: 

DSRA rd, rt, sa 

Description: 

The contents of general register rt are shifted right by sa bits, sign- 
extending the high-order bits. The result is placed in register rd. 

This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruction 



exception. 
Operation: 



64 T: s^O||sa 

GPR[rd] ^ (GPR[rt]63)' II GPR[rt] 63...s 



Exceptions: 

Reserved instruction exception (R4000 in 32-bit mode) 



A-68 



MIPS R4000 Microprocessor User's Manual 



CPU Instruction Set Details 



r^ODAl/ Doubleword Shift Right nCDA\/ 
UOriAV Arithmetic Variable UOriAV 



31 26 


25 


21 


20 16 


15 


11 


10 6 


5 0 




SPECIAL 
000000 


rs 


rt 


rd 


0 

0 00 0 0 


DSRAV 
0 10 111 




6 




5 


5 


5 




5 


6 



Format: 

DSRAV rd, rt, rs 

Description: 

The contents of general register rt are shifted right by the niimber of bits 
specified by the low-order six bits of general register rs, sign-extending the 
high-order bits. The result is placed in register rd. 

This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruction 
exception. 

Operation: 

64 T: s <- GPR[rs]5...o 

GPR[rd] ^ (GPR[rt]63)' II GPR[rt]63...s 



Exceptions: 

Reserved instruction exception (R4000 in 32-bit mode) 
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DSRA32 



Doubleword Shift Right 
Arithmetic + 32 



DSRA32 



31 26 


25 21 


20 


16 


15 11 


10 6 


5 0 




SPECIAL 


0 


rt 




rd 


sa 


DSRA32 






000000 


0 0 0 00 










111111 




6 


5 


5 




5 


5 


6 



Format: 

DSRA32 rd, rt, sa 

Description: 

The contents of general register rt are shifted right by 32+sa bits, sign- 
extending the high-order bits. The result is placed in register rd. 

This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruction 
exception. 



Operation: 




Exceptions: 

Reserved instruction exception (R4000 in 32-bit mode) 
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DSRL 



Doubleword 
Shift Right Logical 



DSRL 



31 26 


25 21 


20 


16 


15 


11 


10 6 


5 0 




SPECIAL 
000000 


0 

0 0 00 0 


rt 


rd 


sa 


DSRL 
1110 10 




6 


5 


5 




5 




5 


6 



Format: 

DSRL rd, rt, sa 

Description: 

The contents of general register rt are shifted right by sa bits, inserting 
zeros into the high-order bits. The result is placed in register rd. 

This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruction 
exception. 



Operation: 



64 T: s ^ 0 II sa 

GPR[rd] <- 0^ II GPR[rt]63... 



Exceptions: 

Reserved instruction exception (R4000 in 32-bit mode) 
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DSRLV 



Doubleword Shift Right 
Logical Variable 



DSRLV 



31 26 


25 


21 


20 16 


15 


11 


10 6 


5 0 




SPECIAL 
000000 


rs 


rt 


rd 


0 

0 00 0 0 


DSRLV 
0 10 110 




6 




5 


5 


5 




5 


6 



Format: 

DSRLV rd, rt, rs 

Description: 

The contents of general register rt are shifted right by the niimber of bits 

specified by the low-order six bits of general register rs, inserting zeros 
into the high-order bits. The result is placed in register rd. 

This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruction 
exception. 

Operation: 



64 T: s<- GPR[rs]5 ,.o 

GPR[rd]^ 0^ II GPR[rt]63.., 



Exceptions: 

Reserved instruction exception (R4000 in 32-bit mode) 
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DSRL32 °°"''ercaP!'3'2'''^'' DSRL32 



31 26 


25 21 


20 


16 


15 11 


10 6 


5 0 




SPECIAL 


0 


rt 




rd 


sa 


DSRL32 






000000 


0 0 00 0 










111110 




6 


5 


5 




5 


5 


6 



Format: 

DSRL32 rd, rt, sa 

Description: 

The contents of general register rt are shifted right by 32+sa bits, inserting 
zeros into the high-order bits. The result is placed in register rd. 

This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruction 
exception. 

Operation: 

64 T: s ^ 1 II sa 

GPR[rd] ^ 0^ II GPR[rt]63...s 

Exceptions: 

Reserved instruction exception (R4000 in 32-bit mode) 
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DSU B Doubleword Subtract DSU B 



31 26 


25 


21 


20 16 


15 


11 


10 6 


5 0 




SPECIAL 
000000 


rs 


rt 


rd 


0 

0 00 0 0 


DSUB 
10 1110 




6 




5 


5 


5 




5 


6 



Format: 

DSUB rd, rs, rt 

Description: 

The contents of general register rt are subtracted from the contents of 
general register rs to form a result. The result is placed into general 
register rd. 

The only difference between this instruction and the DSUBU instruction is 
that DSUBU never traps on overflow. 

An integer overflow exception takes place if the carries out of bits 62 and 
63 differ (2's complement overflow). The destination register rd is not 
modified when an integer overflow exception occurs. 

This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruction 
exception. 

Operation: 

64 T: GPR[rd] <- GPR[rs] - GPR[rt] 



Exceptions: 

Integer overflow exception 

Reserved instruction exception (R4000 in 32-bit mode) 
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DSUBU Doubleword Subtract Unsigned DSUBU 



31 26 25 21 20 16 15 11 10 6 5 0 



SPECIAL 


rs 


rt 


rd 


0 


DSUBU 


000000 








0 00 0 0 


10 1111 


6 


5 


5 


5 


5 


6 



Format: 

DSUBU rd, rs, rt 

Description: 

The contents of general register rt are subtracted from the contents of 
general register rs to form a result. The result is placed into general 

register rd. 

The only difference between this instruction and the DSUB instruction is 
that DSUBU never traps on overflow. No integer overflow exception 
occurs imder any circumstances. 

This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruction 
exception. 

Operation: 

64 T: GPR[rd] <- GPR[rs] - GPR[rt] 



Exceptions: 

Reserved instruction exception (R4000 in 32-bit mode) 
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E R ET Exception Return E R ET 



31 26 


25 24 




6 


5 0 




COPO 
0 10 0 0 0 


CO 
1 


0 

0 0 0 0 0 00 00 0 0 0 0 00 0 00 0 


ERET 
0 110 0 0 




6 


1 




19 




6 



Format: 

ERET 

Description: 

ERET is the R4000 instruction for returning from an interrupt, exception, 
or error trap. Unlike a branch or jvmp instruction, ERET does not execute 
the next instruction. 

ERET must not itself be placed in a branch delay slot. 

If the processor is servicing an error trap {SR2 = 1), then load the PC from 
the ErrorEPC and clear the ERL bit of the Status register {SR2)- Otherwise 
{SR2 = 0), load the PC from the EPC, and clear the EXL bit of the Status 
register (SR^). 

An ERET executed between a LL and SC also causes the SC to fail. 



Operation: 



32, 64 


T: if SR2 = 1 then 






PC<- ErrorEPC 






SR<-SR3i. _3 II 0 1 


1 SR1...0 




else 




PC^ EPC 






SR<-SR3i...2 II 0 1 


ISRo 




endif 




LLbit<- 0 





Exceptions: 

Coprocessor unusable exception 
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J Jump J 



31 26 


25 




0 




J 

0 0 0 0 10 


target 




6 




26 





Format: 

J target 

Description: 

The 26-bit target address is shifted left two bits and combined with the 
high-order bits of the address of the delay slot. The program 
imconditionally jumps to this calculated address with a delay of one 
instruction. 



Operation: 



32 


T: 


temp <r- target 






T+1: 


PC <- PC31 28 


1 temp II 0^ 


64 


T: 


temp <- target 






T+1: 


PC <— PC53 28 


1 temp II 0^ 



Exceptions: 

None 
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JAL Jump And Link JAL 



31 26 


25 




0 




JAL 
0 0 0 0 11 


target 




6 




26 





Format: 

JAL target 

Description: 

The 26-bit target address is shifted left two bits and combined with the 
high-order bits of the address of the delay slot. The program 
imconditionally jimips to this calculated address with a delay of one 
instruction. The address of the instruction after the delay slot is placed in 
the link register, r31. 



Operation: 



32 


T: 


temp <- target 






GPR[31]<-PC + 8 




T+1: 


PC<- PC 31. 28 II temp II 0^ 


64 


T: 


temp <- target 






GPR[31]^ PC + 8 




T+1: 


PC^PC 63...28 II temp II 02 



Exceptions: 

None 
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Jump And Link Register J ALR 



31 26 


25 


21 


20 16 


15 11 


10 6 


5 0 




SPECIAL 




rs 


0 


rd 


0 


JALR 






000000 






0 0 0 00 




0 0 0 00 


0 0 10 0 1 




6 




5 


5 


5 


5 


6 



Format: 

JALR rs 
JALR rd, rs 

Description: 

The program unconditionally jumps to the address contained in general 
register rs, with a delay of one instruction. The address of the instruction 
after the delay slot is placed in general register rd. The default value of rd, 
if omitted in the assembly language instruction, is 31. 

Register specifiers rs and rd may not be equal, because such an instruction 
does not have the same effect when re-executed. However, an attempt to 
execute this instruction is not trapped, and the result of executing such an 
instruction is imdefined. 

Since instructions must be word-aligned, a Jump and Link Register 

instruction must specify a target register (rs) whose two low-order bits are 
zero. If these low-order bits are not zero, an address exception wiU occur 
when the jump target instruction is subsequently fetched. 



Operation: 



32, 64 


T: 


temp ^ GPR [rs] 






GPR[rd] ^ PC + 8 




T+1: 


PC <- temp 



Exceptions: 

None 
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J R Jump Register J f\ 



31 26 


25 


21 20 6 5 


0 




SPECIAL 
000000 


rs 


0 

00 0 0 0 0 0 0 00 0 00 0 0 


JR 
00 1000 




6 


5 




15 




6 



Format: 

JR rs 

Description: 

The program unconditionally jumps to the address contained in general 
register rs, with a delay of one instruction. 

Since instructions must be word-aligned, a Jump Register instruction 
must specify a target register (rs) whose two low-order bits are zero. If 
these low-order bits are not zero, an address exception will occur when the 
jump target instruction is subsequently fetched. 

Operation: 



32, 64 


T: 


temp <- GPR[rs] 




T+1: 


PC <- temp 



Exceptions: 

None 
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LB Load Byte LB 



31 26 


25 21 


20 16 


15 




0 




LB 

100000 


base 


rt 


offset 




6 


5 


5 




16 





Format: 

LB rt, offset(base) 

Description: 

The 16-bit ojfset is sign-extended and added to the contents of general 

register base to form a virtual address. The contents of the byte at the 
memory location specified by the effective address are sign-extended and 
loaded into general register rt. 

Operation: 

32 T: vAddr^((offseti5)^6||offseti5 o) + GPR[base] 

(pAddr, uncached) AddressTranslation (vAddr, DATA) 
pAddr <- pAddrpsizE - 1 ... 3 II (pAddr2...o xor ReverseEndian^) 
mem ^ LoadMemory (uncached, BYTE, pAddr, vAddr, DATA) 
byte ^ vAddr2 ..oxor BigEndianCPU"^ 
GPR[rt] ^ (memy+g-byte)^"^ II fTiem7+8*byte...8*byte 

64 T: vAddr ^ ((offsetis)^^ || offsetig q) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr <- pAddrpsizE - 1 ... 3 II (pAddr2...o xor ReverseEndian"^) 
mem 4- LoadMemory (uncached, BYTE, pAddr, vAddr, DATA) 
byte ^ vAddr2. 0 xor BigEndianCPU'^ 
GPR[rt] ^ (mem7+8*byte)^® II mem7+8*byte...8*byte 



Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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L B U l-oad Byte Unsigned L B U 



31 26 


25 21 


20 16 


15 




0 




LBU 
10 0 10 0 


base 


rt 


offset 




6 


5 


5 




16 





Format: 

LBU rt, offset(base) 

Description: 

The 16-bit ojfset is sign-extended and added to the contents of general 

register base to form a virtual address. The contents of the byte at the 
memory location specified by the effective address are zero-extended and 
loaded into general register rt. 

Operation: 

32 T: vAddr ^ ((offsetig)^^ || offseti5...o) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr<- pAddrpgizE_i 3 || (pAddr2. oxor ReverseEndian^) 
mem <- LoadMemory (uncached, BYTE, pAddr, vAddr, DATA) 
byte <- vAddr2...o xor BigEndianCPU^ 
GPR[rt] <- 0^"^ II mem7^.8' byte...8* byte 



64 T: vAddr <- ((offsetis)'*^ || offsetis . o) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr <- pAddrpsizE- 1...3 II (pAddr2...o xoi" ReverseEndian^) 
mem <- LoadMemory (uncached, BYTE, pAddr, vAddr, DATA) 
byte ^ vAddr2 0 xor BigEndianCPU"^ 
GPR[rt] <- 0^6 ii mem7+8* byte...8* byte 



Exceptions: 

TLB refill exception TLB invalid exception 

Bus error exception Address error exception 
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LD 



Load Doubleword 



LD 



31 26 


25 21 


20 16 


15 




0 




LD 
110 111 


base 


rt 


offset 




6 


5 


5 




16 





Format: 

LD rt, offset(base) 

Description: 

The 16-bit ojfset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of the 64-bit 
doubleword at the memory location specified by the effective address are 
loaded into general register rt. 

If any of the three least-significant bits of the effective address are non- 
zero, an address error exception occurs. 

This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruction 
exception. 



Operation: 



64 



vAddr<- ((offsetis)"^^ || offseti5 ..o) + GPR[base] 

(pAddr, uncached) ^ AddressTranslation (vAddr, DATA) 

mem LoadMemory (uncached, DOUBLEWORD, pAddr, vAddr, DATA) 

GPR[rt] <- mem 



Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 

Reserved instruction exception (R4000 in 32-bit user mode 

R4000 in 32-bit supervisor mode) 



MIPS R4000 Microprocessor User's Manual 



A-83 



Appendix A 



LDCZ Load Doubleword To Coprocessor LDCZ 



31 26 


25 21 


20 16 


15 




0 




LDCz 
1 1 0 1 X X* 


base 


rt 


offset 




6 


5 


5 




16 





Format: 

LDCz rt, offset(base) 

Description: 

The 16-bit ojfset is sign-extended and added to the contents of general 
register base to form a virtual address. The processor reads a doubleword 
from the addressed memory location and makes the data available to 
coprocessor unit z. The manner in which each coprocessor uses the data 
is defined by the individual coprocessor specifications. 

If any of the three least-significant bits of the effective address are non- 
zero, an address error exception takes place. 

This instruction is not valid for use with CPO. 

This instruction is undefined when the least-significant bit of the 
rt field is non-zero. 



*See the table "Opcode Bit Encoding" on next page, or "CPU Instruction 
Opcode Bit Encoding" at the end of Appendix A. 
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LDCz 



Load Doubleword To Coprocessor 
(continued) 



LDCz 



Operation: 



32 T: 



64 T: 



vAddr<- ((offsetis)^^ || offset-15. .q) + GPR[base] 

(pAddr, uncached) <- Address! ranslation (vAddr, DATA) 

mem <- LoadMemory (uncached, DOUBLEWORD, pAddr, vAddr, DATA) 

COPzLD (rt, mem) 



vAddr <r- ((offsetig)"^^ || offsetis . o) + GPR[base] 
(pAddr, uncached) <- Addresstranslation (vAddr, DATA) 
mem <- LoadMemory (uncached, DOUBLEWORD, pAddr, vAddr, DATA) 
COPzLD (rt, mem) 



Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
Coprocessor unusable exception 



Opcode Bit Encoding: 



LDH? Bit #31 


30 


29 


28 


27 


26 




0 


LDC1 


1 


1 


0 


1 


0 


1 






Bit #31 


30 


29 


28 


27 


26 




0 


LDC2 


1 


1 


0 


1 


1 


0 






V. 








J 








Opcode 




V 

Coprocessor Unit Number 
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LDL 



Load Doubleword Left 



LDL 



31 26 


25 21 


20 1615 




0 




LDL 
0 110 10 


base 


rt 


offset 




6 


5 


5 




16 





Format: 

LDL rt, offset(base) 
Description: 

This instruction can be used in combination with the LDR instruction to 
load a register with eight consecutive bytes from memory, when the bytes 
cross a doubleword boundary. LDL loads the left portion of the register 
with the appropriate part of the high-order doubleword; LDR loads the 
right portion of the register with the appropriate part of the low-order 
doubleword. 

The LDL instruction adds its sign-extended 16-bit offset to the contents of 
general register base to form a virtual address which can specify an 
arbitrary byte. It reads bytes only from the doubleword in memory which 
contains the specified starting byte. From one to eight bytes will be 
loaded, depending on the starting byte specified. 

Conceptually, it starts at the specified byte in memory and loads that byte 
into the high-order (left-most) byte of the register; then it loads bytes from 
memory into the register until it reaches the low-order byte of the 
doubleword in memory. The least-significant (right-most) byte(s) of the 
register will not be changed. 



address 8 
address 0 



memory 
(big-endian) 



8 


9 


10 


11 


12 


13 


14 


15 


0 


1 


2 


3 


4 


5 


6 


7 



register 



before 



A 


B 


C 


D 


E 


F 


G 


H 



LDL $24,3($0) 



3 


4 


5 


6 


7 


F 


G 


H 



$24 
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LDL 



Load Doubleword Left 
(continued) 



LDL 



The contents of general register rt are internally b5^assed within the 
processor so that no NOP is needed between an immediately preceding 
load instruction which specifies register rt and a following LDL (or LDR) 
instruction which also specifies register rt. 

No address exceptions due to alignment are possible. 

This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruction 
exception. 

Operation: 



64 T: vAddr^((offseti5r"||offseti5...o) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr <- pAddrpsizE-1 ...3 II (pAddr2...o xor ReverseEndian^) 
if BigEndianMem = 0 then 



pAddr^ pAddrpsizE-i...3ll 0^ 
endif 

byte <- vAddr2...o xoi" BigEndianCPU^ 

mem ^ LoadMemory (uncached, byte, pAddr, vAddr, DATA) 

GPR[rt] ^ mem7+8*byte...O II GPR[rt]55_8*byte...o 
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LDL 



Load Doubleword Left 
(continued) 



LDL 



Given a doubleword in a register and a doubleword in memory, the 
operation of LDL is as follows: 



LDL 




Register 


A 


B 


C 


D 


E 


F 


G 


H 








Memory 


1 


J 


K 


L 


M 


N 


0 


P 











BigEndianCPU = 0 




BigEndianCPU = 


1 


vAddrj.o 


destination 


type 


offset 




destination 




type 


offset 


LEM 


BEM 






LEM BEM 


0 


P 


B C D E F G H 


0 


0 


7 


1 J 


K L M N 0 


P 


7 


0 0 


1 


0 


P 0 D E F G H 


1 


0 


6 


J K 


L M N 0 P 


H 


6 


0 1 


2 


N 


0 P D E F G H 


2 


0 


5 


K L 


M N 0 P G 


H 


5 


0 2 


3 


M 


N 0 P E F G P 


3 


0 


4 


L M 


N 0 P F G 


H 


4 


0 3 


4 


L 


M N 0 P F G H 


4 


0 


3 


M N 


0 P E F G 


H 


3 


0 4 


5 


K 


L M N 0 P G H 


5 


0 


2 


N 0 


P D E F G 


H 


2 


0 5 


6 


J 


K L MN 0 P H 


6 


0 


1 


0 P 


0 D E F G 


H 


1 


0 6 


7 


1 


J K L M N 0 P 


7 


0 


0 


P B 


C D E F G 


H 


0 


0 7 



LEM 
BEM 
Type 
Offset 



Little-endian memory (BigEndianMem = 0) 
BigEndianMem = 1 

AccessT5^e (see Table 2-1) sent to memory 
pAddr2 0 sent to memory 



Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 

Reserved instruction exception (R4000 in 32-bit mode) 
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LDR 



Load Doubleword Right 



LDR 



31 26 


25 21 


20 16 
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LDR 
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base 


rt 


offset 
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5 


5 




16 





Format: 

LDR rt, offset(base) 
Description: 

This instruction can be used in combination with the LDL instruction to 
load a register with eight consecutive bytes from memory, when the bytes 
cross a doubleword boundary. LDR loads the right portion of the register 
with the appropriate part of the low-order doubleword; LDL loads the left 
portion of the register with the appropriate part of the high-order 
doubleword. 

The LDR instruction adds its sign-extended 16-bit ojfset to the contents of 
general register base to form a virtual address which can specify an 
arbitrary byte. It reads bytes only from the doubleword in memory which 
contains the specified starting byte. From one to eight bytes will be 
loaded, depending on the starting byte specified. 

Conceptually, it starts at the specified byte in memory and loads that byte 
into the low-order (right-most) byte of the register; then it loads bytes from 
memory into the register until it reaches the high-order byte of the 
doubleword in memory. The most significant (left-most) byte(s) of the 
register will not be changed. 



address 8 
address 0 



memory 

(big-endian) 
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LDR 



Load Doubleword Right 
(continued) 



LDR 



The contents of general register rt are internally bypassed within the 
processor so that no NOP is needed between an immediately preceding 
load instruction which specifies register rt and a following LDR (or LDL) 
instruction which also specifies register rt. 

No address exceptions due to alignment are possible. 

This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruction 
exception. 



Operation: 

64 T: vAddr ^ ((offsetig)'*^ || offseti5...o) + GPR[base] 



(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 

pAddr<- pAddrpsizE-i...3 II (pAddr2...o xor ReverseEndian^) 

if BigEndianMem = 1 then 

pAddr<- pAddrgi 3 || 0^ 

endif 

byte <r- vAddr2...o xoi" BigEndianCPU^ 

mem <- LoadMemory (uncached, byte, pAddr, vAddr, DATA) 

GPR[rt] ^ GPR[rt]63...64-8*byte II mem63...8«byte 
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LDR 



Load Doubleword Right 
(continued) 



LDR 



Given a doubleword in a register and a doubleword in memory, the 
operation of LDR is as follows: 



LDR 




Register 
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Memory 
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LEM 
BEM 
Type 
Offset 



0) 



Little-endian memory (BigEndianMem 
BigEndianMem = 1 

AccessT5rpe (see Table 2-1) sent to memory 
pAddr2 0 sent to memory 



Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 

Reserved instruction exception (R4000 in 32-bit mode) 
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Format: 

LH rt, offset(base) 



Description: 

The 16-bit ojfset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of the halfword at the 
memory location specified by the effective address are sign-extended and 
loaded into general register rt. 

If the least-significant bit of the effective address is non-zero, an address 
error exception occiirs. 

Operation: 

32 T: vAddr <- ((offsetis)''^ || offsetis . o) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr<- pAddrpsizE-i...3 II (pAddr2...o xor (ReverseEndian || 0)) 
mem <- LoadMemory (uncached, HALFWORD, pAddr, vAddr, DATA) 
byte <- vAddr2 ..o xor (BigEndianCPU^ || 0) 
GPR[rt] ^ (memis+g.byte)^® II m8mi5^8.byte...8* byte 

64 T: vAddr ^ ((offseti g)"^^ || offseti5...o) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr^ pAddrpsizE-i...3 II (pAddr2 ..o xor (ReverseEndian || 0)) 
mem <- LoadMemory (uncached, HALFWORD, pAddr, vAddr, DATA) 
byte <- vAddr2...o xor (BigEndianCPU^ || 0) 
GPR[rt] <- (memi5+8*byte)''^ II "ie"1l5+8*byte...8* byte 



Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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Format: 

LHU rt, offset(base) 

Description: 

The 16-bit ojfset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of the halfword at the 
memory location specified by the effective address are zero-extended and 
loaded into general register rt. 

If the least-significant bit of the effective address is non-zero, an address 
error exception occurs. 

Operation: 

32 T: vAddr <- ((offsetig)^^ || offseti5...o) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 

pAddr <r- pAddrpsizE-i...3 II (pAddr2 o xor (ReverseEndian^ || 0)) 
mem ^ LoadMemory (uncached, HALFWORD, pAddr, vAddr, DATA) 
byte vAddr2...o xor (BigEndianCPU^ || 0) 
GPR[rt] ^0^6 II memi5+8*by,e...8*byte 

64 T: vAddr <- ((offsetig)"^^ || offsetig . o) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr^ pAcldrpsizE_i 3 || (pAddr2.. o xor (ReverseEndian^ || 0)) 
mem <- LoadMemory (uncached, HALFWORD, pAddr, vAddr, DATA) 
byte <- vAddr2...o xor (BigEndianCPU^ || 0) 
GPR[rt] ^-O"*^ II memi5+8*byte...8*byte 



Exceptions: 

TLB refill exception TLB invalid exception 

Bus Error exception Address error exception 
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LL Load Linked LL 
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Format: 

LL rt, offset(base) 

Description: 

The 16-bit ojfset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of the word at the 
memory location specified by the effective address are loaded into general 
register rt. In 64-bit mode, the loaded word is sign-extended. 

The processor begins checking the accessed word for modification by 
other processor and devices. 

Load Linked and Store Conditional can be used to atomically update 
memory locations as shown: 



LI: 




LL 


T1,(T0) 


ADD 


T2, 11, 1 


SC 


T2, (TO) 


BEQ 


T2, 0, LI 


NOP 





This atomically increments the word addressed by TO. Changing the ADD 
to an OR changes this to an atomic bit set. This instruction is available in 
User mode, and it is not necessary for CPO to be enabled. 

The operation of LL is undefined if the addressed location is uncached 
and, for synchronization between multiple processors, the operation of LL 
is undefined if the addressed location is noncoherent. A cache miss that 
occurs between LL and SC may cause SC to fail, so no load or store 
operation should occur between LL and SC, otherwise the SC may never 
be successful. Exceptions also cause SC to fail, so persistent exceptions 
must be avoided. If either of the two least-significant bits of the effective 
address are non-zero, an address error exception takes place. 
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(continued) 



Operation: 

32 T: vAddr ^ ((offsetig)^^ || offsetig . o) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr <- pAddrpsizE-i...3 II (pAddr2...o xor (ReverseEndian || 0^)) 
mem <- LoadMemory (uncached, WORD, pAddr, vAddr, DATA) 
byte <- vAddr2...o xor (BigEndianCPU || 0^) 

GPR[rt] <-mem3i+8*byte...8*byte 
LLbit <- 1 

64 T: vAddr <- ((offsetis)"^^ || offsetis . o) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr <r- pAddrpsizE-i ...3 II (pAddr2...o xor (ReverseEndian || 0^)) 
mem <- LoadMemory (uncached, WORD, pAddr, vAddr, DATA) 
byte <- vAddr2...o xor (BigEndianCPU || 0^) 

GPR[rt] ^ (mem3-,+8*byte)^^ II mem31+8*byte...8*byte 
LLbit ^ 1 



Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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LLD 



Load Linked Doubleword 



LLD 
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Format: 

LLD rt, offset(base) 

Description: 

The 16-bit ojfset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of the doubleword at 
the memory location specified by the effective address are loaded into 
general register rt. 

The processor begins checking the accessed word for modification by 
other processor and devices. 

Load Linked Doubleword and Store Conditional Doubleword can be used 
to atomically update memory locations: 



LI: 




LLD 


T1,(T0) 


ADD 


T2, T1 , 1 


SCD 


T2, (TO) 


BEQ 


T2, 0, LI 


NOP 





This atomically increments the word addressed by TO. Changing the ADD 
to an OR changes this to an atomic bit set. 



A-% 
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Load Linked Doubleword 
(continued) 



LLD 



The operation of LLD is iindefined if the addressed location is tincached 
and, for S5mchronization between multiple processors, the operation of 
LLD is undefined if the addressed location is noncoherent. A cache miss 
that occurs between LLD and SCD may cause SCD to fail, so no load or 
store operation should occur between LLD and SCD, otherwise the SCD 
may never be successful. Exceptions also cause SCD to fail, so persistent 
exceptions must be avoided. 

This instruction is available in User mode, and it is not necessary for CPO 
to be enabled. 

If any of the three least-significant bits of the effective address are non- 
zero, an address error exception takes place. 

This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruction 
exception. 



Operation: 

64 T: vAddr <- ((offsetis)'^^ || offseti5...o) + GPR[base] 



(pAddr, uncached) ^ AddressTranslation (vAddr, DATA) 
mem ^ LoadMemory (uncached, DOUBLEWORD, pAddr, vAddr, DATA) 
GPR[rt] ^mem 
LLbit <- 1 



TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 

Reserved instruction exception (R4000 in 32-bit mode) 



Exceptions: 
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Load Upper Immediate 
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Format: 

LUI rt, immediate 

Description: 

The 16-bit immediate is shifted left 16 bits and concatenated to 16 bits of 
zeros. The result is placed into general register rt. In 64-bit mode, the 
loaded word is sign-extended. 

Operation: 



32 T: GPR[rt] ^ immediate II 0^^ 

64 T: GPR[rt] ^ (immediatei 5)^2 II immediate II 0^^ 



Exceptions: 

None 
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Format: 

LW rt, offset(base) 

Description: 

The 16-bit ojfset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of the word at the 
memory location specified by the effective address are loaded into general 
register rt. In 64-bit mode, the loaded word is sign-extended. If either of 
the two least-significant bits of the effective address is non-zero, an 
address error exception occurs. 

Operation: 

32 T: vAddr <- ((offsetis)''^ || offseti5. .o) + GPR[base] 

(pAddr, uncached) <r- AddressTranslation (vAddr, DATA) 

pAddr<- pAddrpsizE-i...3 II (pAddr2, _o xor (ReverseEndian || 0^)) 
mem <- LoadMemory (uncached, WORD, pAddr, vAddr, DATA) 
byte <- vAddr2 .0 xor (BigEndianCPU || 0^) 

GPR[rt] <- mem3-|+8*byte...8*byte 

64 T: vAddr ^ ((offsetis)'^^ || offsetig q) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr<- pAddrpsizE-i...3 II (pAddr2...o xoi" (ReverseEndian || 0^)) 
mem <- LoadMemory (uncached, WORD, pAddr, vAddr, DATA) 
byte ^ vAddr2...o xor (BigEndianCPU || 0^) 
GPR[rt] <- (mem3i^.g*by^e)32 || mem3i+g.byte...8*byte 

Exceptions: 

TLB refill exception TLB invalid exception 

Bus error exception Address error exception 
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Format: 

LWCz rt, offset(base) 

Description: 

The 16-bit ojfset is sign-extended and added to the contents of general 
register base to form a virtual address. The processor reads a word from 
the addressed memory location, and makes the data available to 

coprocessor unit z. 

The manner in which each coprocessor uses the data is defined by the 
individual coprocessor specifications. 

If either of the two least-significant bits of the effective address is non-zero, 
an address error exception occurs. 

This instruction is not valid for use with CPO. 



A-lOO 



*See the table "Opcode Bit Encoding" on next page, or "CPU Instruction 
Opcode Bit Encoding" at the end of Appendix A. 
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Load Word To Coprocessor 
(continued) 



LWCz 



Operation: 



32 



64 



vAddr<- ((offsetis)^^ || offsetis. o) + GPR[base] 
(pAddr, uncached) <- Addresstranslation (vAddr, DATA) 
pAddr<- pAddrpsizE-i...3 II (pAddr2...o xor (ReverseEndian || 0^)) 
mem <- LoadMemory (uncached, WORD, pAddr, vAddr, DATA) 
byte ^ vAddr2 o xor (BigEndianCPU || 0^) 
COPzLW (byte,' rt, mem) 

vAddr <r- ((offsetis)'*^ || offsetig q) + GPR[base} 
(pAddr, uncached)^ AddressTranslation (vAddr, DATA) 
pAddr <r- pAddrpsi2E-i...3 II (pAddr2 o xor (ReverseEndian || 0^)) 
mem <- LoadMemory (uncached, WORD, pAddr, vAddr, DATA) 
byte <- vAddrg o xor (BigEndianCPU || 0^) 
COPzLW (byte,'"rt, mem) 



Exceptions: 

TLB refill exception TLB invalid exception 

Bus error exception Address error exception 

Coprocessor unusable exception 

Opcode Bit Encoding: 



LWCz Bit #31 


30 


29 


28 


27 


26 
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LWC1 


1 


1 
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0 
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1 




Bit #31 


30 


29 


28 


27 


26 
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LWC2 
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0 


0 


1 


0 




V 


















Opcode 




Coprocessor Unit Number 
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LWL 



Load Word Left 



LWL 
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Format: 

LWL rt, offset(base) 

Description: 

This instruction can be used in combination with the LWR instruction to 
load a register with four consecutive bytes from memory, when the bytes 
cross a word boimdary. LWL loads the left portion of the register with the 
appropriate part of the high-order word; LWR loads the right portion of 
the register with the appropriate part of the low-order word. 

The LWL instruction adds its sign-extended 16-bit offset to the contents of 
general register base to form a virtual address which can specify an 
arbitrary byte. It reads bytes only from the word in memory which 
contains the specified starting byte. From one to four bytes will be loaded, 
depending on the starting byte specified. In 64-bit mode, the loaded word 
is sign-extended. 

Conceptually, it starts at the specified byte in memory and loads that byte 
into the high-order (left-most) byte of the register; then it loads bytes from 
memory into the register until it reaches the low-order byte of the word in 
memory. The least-significant (right-most) byte(s) of the register will not 
be changed. 



memory 
(big-endian) 



address 4 
address 0 
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Load Word Left 
(continued) 



LWL 



The contents of general register rt are internally b5^assed within the 
processor so that no NOP is needed between an immediately preceding 
load instruction which specifies register rt and a following LWL (or LWR) 
instruction which also specifies register rt. No address exceptions due to 
alignment are possible. 



Operation: 

32 T: vAddr <- ((offsetig)^^ || offseti5...o) + GPR[base] 



(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr <- pAddrpsizE-i...3 II (pAddr2...o xor ReverseEndian-^) 
if BigEndianMem = 0 then 

pAddr <r- pAddrpsizE-1 ...2 II 0^ 

endif 

byte <- vAddr^ 0 xor BigEndianCPU^ 
word ^ vAddr2 xor BigEndianCPU 

mem <- LoadMemory (uncached, 0 || byte, pAddr, vAddr, DATA) 
temp <- mem32'word+8'byte+7...32*word II GPR[l't]23-8*byte...O 

GPR[rt] <- temp 



64 T: vAddr^((offseti5)^*^||offseti5...o) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr <- pAddrpsizE-i...3 II (pAddr2 o xor ReverseEndian^) 
if BigEndianMem = 0 then 

pAddr <r- pAddrpsizE-i...2 II 0^ 

endif 

byte <r- vAddr-i 0 xor BigEndianCPU^ 
word <r- vAddr2 xor BigEndianCPU 

mem LoadMemory (uncached, 0 || byte, pAddr, vAddr, DATA) 

temp<- mem32*word+8'byte+7...32*word II GPR[rt]23-8*byte...O 

GPR[rt] <- (temp3i)32 || temp 
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LWL 



Load Word Left 
(continued) 



LWL 



Given a doubleword in a register and a doubleword in memory, the 
operation of LWL is as follows: 



LWL 
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LEM 
BEM 

Type 

Offset 

S 



0) 



Little-endian memory (BigEndianMem 
BigEndian.Mem = 1 

AccessType (see Table 2-1) sent to memory 
pAddr2 0 sent to memory 
sign-extend of destinations^ 



Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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LWR Load Word Right LWR 
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Format: 

LWR rt, o£fset(base) 

Description: 

This instruction can be used in combination with the LWL instruction to 
load a register with four consecutive bytes from memory, when the bytes 
cross a word boundary. LWR loads the right portion of the register with 
the appropriate part of the low-order word; LWL loads the left portion of 
the register with the appropriate part of the high-order word. 

The LWR instruction adds its sign-extended 16-bit offset to the contents of 
general register base to form a virtual address which can specify an 
arbitrary byte. It reads bytes only from the word in memory which 
contains the specified starting byte. From one to four bytes will be loaded, 
depending on the starting byte specified. In 64-bit mode, if bit 31 of the 
I destination register is loaded, then the loaded word is sign-extended. 

Conceptually, it starts at the specified byte in memory and loads that byte 
into the low-order (right-most) byte of the register; then it loads bytes from 
memory into the register until it reaches the high-order byte of the word 
in memory. The most significant (left-most) byte(s) of the register will not 

be changed. 



memory 

(big-endian) ^^gjg^^^ 



address 4 
address 0 
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LWR 



Load Word Right 
(continued) 



LWR 



The contents of general register rt are internally b5^assed within the 
processor so that no NOP is needed between an immediately preceding 
load instruction which specifies register rt and a following LWR (or LWL) 
instruction which also specifies register rt. No address exceptions due to 
alignment are possible. 



Operation: 

32 T: vAddr^((offseti5)^6||offseti5...o) + GPR[base] 



(pAddr, uncached) <r- AddressTranslation (vAddr, DATA) 
pAddr <- pAddrpsizE-1 ...3 II (pAddr2...o xor ReverseEndian^) 
if BigEndianMem = 1 then 

pAddr <r- pAddrpsizE-3i...3 II 0^ 

endif 

byte <- vAddr-i 0 xor BigEndianCPU^ 
word f- vAddr2 xor BigEndianCPU 

mem <- LoadMemory (uncached, 0 || byte, pAddr, vAddr, DATA) 
temp <r- GPR[rt]3i 32.8*byte II me"l31+32*word...32*word+8*byte 

GPR[rt] <- temp 



64 T: vAddr^((offseti5)'^«||offseti5...o) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr <r- pAddrpsizE-i...3 II (pAddr2...o xor ReverseEndian^) 
if BigEndianMem = 1 then 



pAddr <- pAddrpsizE-3i...3 II 0^ 
endif 

byte <- vAddr-i 0 xor BigEndianCPU^ 
word vAddr2 xor BigEndianCPU 

mem <- LoadMemory (uncached, 0 || byte, pAddr, vAddr, DATA) 

temp <- GPR[rt]3i _,32-8*byte II "ie"l31+32*word...32*word+8*byte 

GPR[rt] <- (temp3i)32 || tg^p 
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Load Word Right 
(continued) 



LWR 



Given a word in a register and a word in memory, the operation of LWR 
is as follows: 



LWR 
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X 



Little-endian memory (BigEndianMem - 0) 
BigEndianMem = 1 

AccessT3^e (see Table 2-1) sent to memory 

pAddr2 . 0 sent to memory 

sign-extend of destination3i 

either imchanged or sign-extend of destination3i 



Exceptions: 

TLB refill exception 
TLB invalid exception 

Bus error exception 
Address error exception 
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Load Word Unsigned 
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Format: 

LWU rt, offset(base) 

Description: 

The 16-bit ojfset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of the word at the 
memory location specified by the effective address are loaded into general 
register rt. The loaded word is zero-extended. 

If either of the two least-significant bits of the effective address is non-zero, 
an address error exception occurs. 

This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruction 
exception. 

Operation: 



64 T: 



vAddr<- ((offset-, s)"*^ || offset^s q) + GPR[base] 
(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr <r- pAddrpsi2E-i...3 II (pAddr2. o xor (ReverseEndian 
mem ^ LoadMemory (uncached, WORD, pAddr, vAddr, DATA) 
byte vAddr2...o xor (BigEndianCPU || 0^) 
GPR[rt] <- 0^2 II mem3i+8*byte...8*byte 



02)) 



Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 

Reserved instruction exception (R4000 in 32-bit mode) 
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MFCO System Control Coprocessor MFCO 



31 26 


25 21 


20 


16 


15 


11 


10 0 




COPO 
0 10 0 0 0 


MF 
0 0 0 00 


rt 


rd 


0 

00 0 0 0 0 0 0 00 0 




6 


5 


5 




5 




11 



Format: 

MFCO rt, rd 

Description: 

The contents of coprocessor register rd of the CPO are loaded into general 
register rt. 

Operation: 



32 T: data ^ CPR[0,rd] 
T+1 : GPR[rt] <- data 



64 T: data ^ CPR[0,rd] 

T+1 : GPR[rt] <- (datag-,)^^ II datag^. o 

Exceptions: 

Coprocessor unusable exception 
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MFCz 



Move From Coprocessor 



MFCz 



31 26 


25 21 


20 


16 


15 


11 


10 0 




COPz 
0 1 0 0 X X* 


MF 
00 0 0 0 


rt 


rd 


0 

00 0 0 0 00 00 0 0 




6 


5 


5 




5 




11 



Format: 

MFCz rt, rd 

Description: 

The contents of coprocessor register rd of coprocessor z are loaded into 
general register rt. 



Operation: 



32 


T: 


data <- CPR[z,rd] 






T+1: 


GPR[rt] <r- data 




64 


T: 


if rdo = 0 then 








data <- CPR[z,rd4..i 


|0]31...0 






else 








data <- CPR[z,rd4..i 


1 0]63...32 






endif 






T+1: 


GPR[rt]<-(data3i)^^| 


1 data 



Exceptions: 

Coprocessor unusable exception 



*See the table "Opcode Bit Encoding" on next page, or "CPU Instruction 
Opcode Bit Encoding" at the end of Appendix A. 
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MFCz 



Move From Coprocessor 
(continued) 



MFCz 



Opcode Bit Encoding: 



MFHy Bit #31 


30 


29 


28 


27 


26 


25 


24 


23 


22 


21 


0 


MFCO 


0 


1 


0 


0 


0 


0 


0 


0 


0 


0 


0 






Bit #31 


30 


29 


28 


27 


26 


25 


24 


23 


22 


21 


0 


MFC1 


0 


1 


0 


0 


0 


1 


0 


0 


0 


0 


0 






Bit #31 


30 


29 


28 


27 


26 


25 


24 


23 


22 


21 


0 


MFC2 


0 


1 


0 


0 


1 


0 


0 


0 


0 


0 


0 














A. 
















Opcode 




Y 


Coprocessor^u boperation 










Coprocessor Unit Number 
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MpHI Move From HI |^pH| 



31 26 25 16 15 11 10 6 5 0 



SPECIAL 


0 


rd 


0 


MFHI 


000000 


00 0 00 0 0 0 0 0 




00 0 0 0 


0 10000 


6 


10 


5 


5 


6 



Format: 

MFHI rd 

Description: 

The contents of special register HI are loaded into general register rd. 

To ensure proper operation in the event of interruptions, the two 
instructions which follow a MFHI instruction may not be any of the 
instructions which modify the HI register: MULT, MULTU, DIV, DIVU, 
MTHI, DMULT, DMULTU, DDIV, DDIVU. 

Operation: 

32, 64 T: GPR[rd] <- HI 

Exceptions: 

None 
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31 26 


25 




16 


15 




11 


10 6 


5 0 




SPECIAL 




0 






rd 




0 


MFLO 






000000 


0 0 


00 0 0 


0 0 00 








00 0 0 0 


0 10010 




6 




10 






5 




5 


6 



Format: 

MFLO rd 

Description: 

The contents of special register LO are loaded into general register rd. 

To ensure proper operation in the event of interruptions, the two 
instructions which follow a MFLO instruction may not be any of the 
instructions which modify the LO register: MULT, MULTU, DIV, DIVU, 
MTLO, DMULT, DMULTU, DDIV, DDIVU. 

Operation: 

32, 64 T: GPR[rd] <- LO 

Exceptions: 

None 
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MTCO System Control Coprocessor MTCO 



31 26 


25 21 


20 


16 


15 


11 


10 0 




COPO 
0 10 0 0 0 


MT 
00 100 


rt 


rd 


0 

000 0000 00 00 




6 


5 


5 




5 




11 



Format: 

MTCO rt,rd 

Description: 



The contents of general register rt are loaded into coprocessor register rd 
of CPO. 

Because the state of the virtual address translation system may be altered 
by this instruction, the operation of load instructions, store instructions, 
and TLB operations immediately prior to and after this instruction are 
undefined. 



Operation: 



32, 64 T: 


data <r- GPR[rt] 


T+1: 


CPR[0,rd] <r- data 



Exceptions: 

Coprocessor unusable exception 
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MTCZ l^ove To Coprocessor MTCZ 



31 26 


25 21 


20 


16 


15 


11 


10 


0 




COPz 
0 1 0 0 X X* 


MT 
00 100 


rt 


rd 


000 


0 

0 0 0 0 0 000 




6 


5 


5 




5 






11 



Format: 

MTCz rt, rd 



Description: 

The contents of general register rt are loaded into coprocessor register rd 



of coprocessor z. 




Operation: 




32 T: data ^ GPR[rt] 




T+1: CPR[z,rd] <- data 




64 T: data <- GPR[rt]3i 


.0 


T+1 : if rdo = 0 




CPR[z,rd4...i 1 


0]^CPR[z, rd4...i II 0163...32 II data 


else 




CPR[z,rd4_i 1 


0]^data||CPR[z,rd4...i ||0]3i...o 


endif 





Exceptions: 



Coprocessor unusable exception 



*Opcode Bit Encoding: 



MXr^T Bit #31 


30 


29 


28 


27 


26 


25 


24 


23 


22 


21 


0 


COPO 


0 


1 


0 


0 


0 


0 


0 


0 


1 


0 


0 






Bit #31 


30 


29 


28 


27 


26 


25 


24 


23 


22 


21 


0 




C0P1 


0 


1 


0 


0 


0 


1 


0 


0 


1 


0 


0 






Bit #31 


30 


29 


28 


27 


26 


25 


24 


23 


22 


21 


0 




C0P2 


0 


1 


0 


0 


1 


0 


0 


0 


1 


0 


0 
















A. 
















c 


)pcoc 


e CoprocessoTunit Number 


Coprocessor Suboperation 
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MTHI Move To HI MTHI 



31 


26 


25 21 20 6 5 


0 




SPECIAL 
000000 


rs 


0 

0 0 0 0 0 00 0 0 00 0 0 00 


MTHI 
0 10 0 0 1 




6 




5 


15 




6 



Format: 

MTHI rs 

Description: 

The contents of general register rs are loaded into special register HI. 

If a MTHI operation is executed following a MULT, MULTU, DIV, or 
DIVU instruction, but before any MFLO, MFHI, MTLO, or MTHI 
instructions, the contents of special register LO are undefined. 



Operation: 



32,64 


T-2: 


HI <- undefined 




T-1: 


HI <r- undefined 




T: 


HI <- GPR[rs] 



Exceptions: 

None 
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MTLO Move To LO MTLO 



31 


26 


25 21 20 


65 0 




SPECIAL 
000000 


rs 


0 

000000000000000 


MTLO 
0 10 0 11 




6 




5 


15 




6 



Format: 

MTLO rs 

Description: 

The contents of general register rs are loaded into special register LO. 

If a MTLO operation is executed following a MULT, MULTU, DIV, or 
DIVU instruction, but before any MFLO, MFHI, MTLO, or MTHI 
instructions, the contents of special register HI are undefined. 



Operation: 



32,64 


T-2: 


LO <- undefined 




T-1: 


LO <r- undefined 




T: 


LO <- GPR[rs] 



Exceptions: 

None 
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MULT 




Multiply 


MULT 




31 26 


25 21 


20 16 


15 6 


5 0 






SPECIAL 


rs 


rt 


0 


MULT 






0 00000 


00 0000 0000 


0 1 1000 






6 


5 


5 


10 


6 





Format: 

MULT rs, rt 

Description: 

The contents of general registers rs and rt are multiplied, treating both 
operands as 32-bit 2's complement values. No integer overflow exception 
occurs under any circumstances. In 64-bit mode, the operands must be 
valid 32-bit, sign-extended values. 

When the operation completes, the low-order word of the double result is 
loaded into special register LO, and the high-order word of the double 
result is loaded into special register HI. 

If either of the two preceding instructions is MFHI or MFLO, the results of 
these instructions are imdefined. Correct operation requires separating 
reads of HI or LO from writes by a minimum of two other instructions. 
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MULT 



Multiply 
(continued) 



MULT 



Operation: 



32 


T-2: 


LO 


<r- undefined 






HI 


<r- undefined 




T-1: 


LO 


<- undefined 






HI 


<r- undefined 




T: 


t 


<- GPR[rs] * GPR[rt] 






LO 


<- t31...0 






H 1 


tea.. .32 


64 


T-2: 


LO 


<- undefined 






HI 


<r- undefined 




T-1: 


LO 


<r- undefined 






HI 


<- undefined 




T: 


t 


^GPR[rs]3i..,o*GPR[rt]3i...o 






LO 


^(t3l)^^l|t31...0 






HI 


<- (W 11*63... 32 



Exceptions: 

None 
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M U LTU "^"'t'P'y Unsigned M U LTU 



31 26 


25 


21 


20 16 


15 6 


5 0 




SPECIAL 




rs 


rt 


0 


MULTU 






000000 








00 0000 0000 


0 110 0 1 




6 




5 


5 


10 


6 



Format: 

MULTU rs, rt 

Description: 

The contents of general register rs and the contents of general register rt 
are multiplied, treating both operands as unsigned values. No overflow 
exception occurs under any circumstances. In 64-bit mode, the operands 
must be valid 32-bit, sign-extended values. 

When the operation completes, the low-order word of the double result is 
loaded into special register LO, and the high-order word of the double 
result is loaded into special register HI. 

If either of the two preceding instructions is MFHI or MFLO, the results of 
these instructions are imdefined. Correct operation requires separating 
reads of HI or LO from writes by a minimirm of two instructions. 
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MULTU 



Multiply Unsigned 
(continued) 



MULTU 



operation: 



32 


T-2: 


LO 


<- undefined 








HI 


<r- undefined 






T-1: 


LO 


<- undefined 








HI 


<- undefined 






T: 


t 


^ (0 II GPR[rs]) * (0 


1 GPR[rt]) 






LO 


t31...0 








HI 


^ tea... 32 




64 


T-2: 


LO 


<- undefined 








HI 


<- undefined 






T-1: 


LO 


<- undefined 








HI 


<r- undefined 






T: 


t 


^(0||GPR[rs]3i...o) 


*(0||GPR[rt]3i...o) 






LO 


^(t3lPl|t31...0 








HI 


^ (^63) II t63...32 





Exceptions: 

None 
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NOR Nor NOR 



31 26 


25 


21 


20 16 


15 


11 


10 6 


5 0 




SPECIAL 
000000 


rs 


rt 


rd 


0 

0 0 00 0 


NOR 
10 0 111 




6 




5 


5 


5 




5 


6 



Format: 

NOR rd, rs, rt 

Description: 

The contents of general register rs are combined with the contents of 
general register rt in a bit-wise logical NOR operation. The result is placed 
into general register rd. 

Operation: 

32,64 T: GPR[rd] ^ GPR[rs] nor GPR[rt] 

Exceptions: 

None 
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OR °' OR 



31 26 


25 


21 


20 16 


15 


11 


10 6 


5 0 




SPECIAL 
000000 


rs 


rt 


rd 


0 

0 0 00 0 


OR 
100101 




6 




5 


5 


5 




5 


6 



Format: 

OR rd, rs, rt 

Description: 

The contents of general register rs are combined with the contents of 
general register rt in a bit-wise logical OR operation. The result is placed 
into general register rd. 

Operation: 

32, 64 T: GPR[rd] <- GPR[rs] or GPR[rt] 

Exceptions: 

None 
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Q P I Or Immediate Q R I 



31 26 


25 21 


20 16 


15 




0 




ORI 
0 0 110 1 


rs 


rt 


immediate 




6 


5 


5 




16 





Format: 

ORI rt, rs, immediate 

Description: 

The 16-bit immediate is zero-extended and combined with the contents of 
general register rs in a bit-wise logical OR operation. The residt is placed 
into general register rt. 

Operation: 



32 T: GPR[rt] <- GPR[rs]3i . -,6 II (immediate or GPR[rs]i5 ...o) 
64 T: GPR[rt] <-GPR[rs]63...i6 II (immediate or GPR[rs]i5. .o) 



Exceptions: 

None 
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SB store Byte SB 



31 26 


25 21 


20 16 


15 




0 




SB 
10 10 0 0 


base 


rt 


offset 




6 


5 


5 




16 





Format: 

SB rt, offset(base) 

Description: 

The 16-bit ojfset is sign-extended and added to the contents of general 
register base to form a virtual address. The least-significant byte of register 
rt is stored at the effective address. 

Operation: 

32 T: vAddr<-((offseti5)^^ II offseti5...o) +GPR[base] 

(pAddr, uncached) ^ AddressTranslation (vAddr, DATA) 
pAddr <r- pAddrpsizE-i...3 II (pAddr2...o xor ReverseEndian'^) 
byte<-vAddr2 o xoi" BigEndianCPU^ 
data^GPR[rt]63-8*byte...ollO^*''y*^ 

StoreMemory (uncached, BYTE, data, pAddr, vAddr, DATA) 

64 T: vAddr<-((offseti5)'^s II offseti5 o) +GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr<- pAddrpsizE-i...3 II (pAddr2...o xoi" ReverseEndian-^) 
byte<-vAddr2 o xoi" BigEndianCPU^ 

dataf-GPR[rt]63_8.byte...ollO^*''y'^ 

StoreMemory (uncached, BYTE, data, pAddr, vAddr, DATA) 



Exceptions: 

TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 
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SC S*®"*® Conditional SC 



31 26 


25 21 


20 16 


15 




0 




SC 
1110 0 0 


base 


rt 


offset 




6 


5 


5 




16 





Format: 

SC rt, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 

register base to form a virtual address. The contents of general register rt 
are conditionally stored at the memory location specified by the effective 
address. 

If any other processor or device has modified the physical address since 
the time of the previous Load Linked instruction, or if an ERET instruction 
occurs between the Load Linked instruction and this store instruction, the 
store fails and is inhibited from taking place. 

The success or failure of the store operation (as defined above) is indicated 
by the contents of general register rt after execution of the instruction. A 
successful store sets the contents of general register rt to 1; an imsuccessful 
store sets it to 0. 

The operation of Store Conditional is undefined when the address is 
different from the address used in the last Load Linked. 

This instruction is available in User mode; it is not necessary for CPO to be 
enabled. 

If either of the two least-significant bits of the effective address is non-zero, 
an address error exception takes place. 

If this instruction should both fail and take an exception, the exception 
takes precedence. 
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sc 



store Conditional 
(continued) 



SC 



Operation: 

32 T: vAddr <r- ((offsetig)^^ || offseti5...o) + GPR[base] 



(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr <- pAddrpsizE-1 3 II (pAddr2 0 xor (ReverseEndian || 0^)) 
data ^ GPR[rt]63-8*byte...o II 
if Libit tinen 

StoreMemory (uncached, WORD, data, pAddr, vAddr, DATA) 

endif 

GPR[rt]^ II Libit 



64 T: vAddr <- ((offsetis)'^^ || offsetig. q) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr <- pAddrpsizE-1 3 II (pAddr2 0 xor (ReverseEndian jj 0^)) 
data ^ GPR[rt]63-8'byte...O II O^*''^^"" 
if LLbit then 

StoreMemory (uncached, WORD, data, pAddr, vAddr, DATA) 
endif 



TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 



GPR[rt]<- II LLbit 



Exceptions: 
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SC D store Conditional Doubleword SC D 



31 26 


25 21 


20 16 


15 




0 




SCD 
11110 0 


base 


rt 


offset 




6 


5 


5 




16 





Format: 

SCD rt, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 

register base to form a virtual address. The contents of general register rt 
are conditionally stored at the memory location specified by the effective 
address. 

If any other processor or device has modified the physical address since 
the time of the previous Load Linked Doubleword instruction, or if an 
ERET instruction occurs between the Load Linked Doubleword 
instruction and this store instruction, the store fails and is inhibited from 
taking place. 

The success or failure of the store operation (as defined above) is indicated 
by the contents of general register rt after execution of the instruction. A 
successful store sets the contents of general register ri to 1; an unsuccessful 
store sets it to 0. 

The operation of Store Conditional Doubleword is undefined when the 
address is different from the address used in the last Load Linked 
Doubleword. 

This instruction is available in User mode; it is not necessary for CPO to be 
enabled. 

If either of the three least-significant bits of the effective address is non- 
zero, an address error exception takes place. 
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SCD 



Store Conditional Doubleword 
(continued) 



SCD 



If this instruction shoiild both fail and take an exception, the exception 
takes precedence. 

This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruction 
exception. 



Operation: 

64 T: vAddr ^ ((offsetis)'^^ || offsetig q) + GPR[base] 



(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
data ^GPR[rt] 
if Libit then 

StoreMemory (uncached, DOUBLEWORD, data, pAddr, vAddr, DATA) 
endif 

GPR[rt]<- II LLbit 



TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 

Reserved instruction exception (R4000 in 32-bit mode) 



Exceptions: 
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3D store Doubleword 3D 



31 26 


25 21 


20 16 


15 




0 




SD 
111111 


base 


rt 


offset 




6 


5 


5 




16 





Format: 

SD rt, offiset(base) 

Description: 

The 16-bit ojfset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of general register rt 
are stored at the memory location specified by the effective address. 

If either of the three least-significant bits of the effective address are non- 
zero, an address error exception occurs. 

This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruction 
exception. 

Operation: 

64 T: vAddr^ ((offsetis)^^ || offseti5...o) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
data <- GPR[rt] 

StoreMemory (uncached, DOUBLEWORD, data, pAddr, vAddr, DATA) 



Exceptions: 

TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 

Address error exception 

Reserved instruction exception (R4000 in 32-bit user mode 

R4000 in 32-bit supervisor mode) 
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SDCZ store Doubleword SDCZ 

From Coprocessor 



31 26 


25 21 


20 16 


15 




0 




SDCz 

1 1 1 1 X X* 


base 


rt 


offset 




6 


5 


5 




16 





Format: 

SDCz rt, offset(base) 

Description: 

The 16-bit ojfset is sign-extended and added to the contents of general 
register base to form a virtual address. Coprocessor unit z sources a 
doubleword, which the processor writes to the addressed memory 
location. The data to be stored is defined by individual coprocessor 
specifications. 

If any of the three least-significant bits of the effective address are non- 
zero, an address error exception takes place. 

This instruction is not valid for use with CPO. 

This instruction is undefined when the least-significant bit of the rt field is 
non-zero. 

Operation: 



32 T: vAddr <- ((offsetis)^^ || offseti5 ..o) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
data<- COPzSD(rt), 

StoreMemory (uncached, DOUBLEWORD, data, pAddr, vAddr, DATA) 

64 T: vAddr^((offseti5)''^||offseti5...o) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
data<- COPzSD(rt), 

StoreMemory (uncached, DOUBLEWORD, data, pAddr, vAddr, DATA) 



*See the table, "Opcode Bit Encoding" on next page, or "CPU Instruction 
Opcode Bit Encoding" at the end of Appendix A. 
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SDCz 



Store Doubleword 
From Coprocessor 
(continued) 



SDCz 



Exceptions: 

TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 
Coprocessor unusable exception 

Opcode Bit Encoding: 



SDCz Bit #31 


30 


29 


28 


27 


26 




0 


SDC1 


1 


1 


1 


1 


0 


1 






Bit #31 


30 


29 


28 


27 


26 




0 


SDC2 


1 


1 


1 


1 


1 


0 


























SD opcode 


Coprocessor Unit Number 
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SDL Doubleword Left SDL 



31 26 


25 21 


20 16 


15 




0 




SDL 
10 110 0 


base 


rt 


offset 




6 


5 


5 




16 





Format: 

SDL rt, offset(base) 
Description: 

This instruction can be used with the SDR instruction to store the contents 
of a register into eight consecutive bytes of memory, when the bytes cross 
a doubleword boundary. SDL stores the left portion of the register into the 
appropriate part of the high-order doubleword of memory; SDR stores the 
right portion of the register into the appropriate part of the low-order 
doubleword. 

The SDL instruction adds its sign-extended 16-bit offset to the contents of 
general register base to form a virtual address which may specify an 
arbitrary byte. It alters only the word in memory which contains that byte. 
From one to four bytes will be stored, depending on the starting byte 
specified. 

Conceptually, it starts at the most-significant byte of the register and 

copies it to the specified byte in memory; then it copies bytes from register 
to memory until it reaches the low-order byte of the word in memory. 

No address exceptions due to alignment are possible. 





memory 

(big-endian) 


register 


address 8 
address 0 


8 


9 


10 


11 


12 


13 


14 


15 


0 


1 


2 


3 


4 


5 


6 


7 


before aBCDEFGH$24 






SDL $24,1 ($0) y 

after ■M-^"'^ 


address 8 
address 0 


8 


9 


10 


11 


12 


13 


14 


15 


0 


B 


C 


D 


E 


F 


G 


H 
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Q|-^l Store Doubleword Left Qf^l 

OUL (continued) OUL 



This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruction 
exception. 



Operation: 



64 T: vAddr <- ((offsetis)^^ || offset 15 0) + GPR[base] 

(pAddr, uncached) <r- AddressTranslation (vAddr, DATA) 
pAddr<- pAddrpsizE-i...3 II (pAddr2...o xor ReverseEndian^) 
If BigEndianMem = 0 then 

pAddr <- pAddr3i 3 || 0^ 

endif 

byte<-vAddr2 0 xor BigEndianCPU"^ 

data ^ 056-8*byte || GPR[rt]63...56-8*byte 

Storememory (uncached, byte, data, pAddr, vAddr, DATA) 
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Store Doubleword Left 
(continued) 



SDL 



Given a doubleword in a register and a doubleword in memory, the 
operation of SDL is as follows: 



SDL 




Register 


A 


B 


C 


D 


E 


F 


G 


H 








Memory 


1 


J 


K 


L 


M 


N 


0 


P 









vAddr2..o 


BigEndianCPU = 0 




BigEndianCPU = 


1 


destination 


type 


offset 


destination 


type 


offset 


LEM 


BEM 


LEM BEM 


0 


1 J 


K L M N 0 A 


0 


0 


7 


A B 


C D E F G H 


7 


0 0 


1 


1 J 


K L M N A B 


1 


0 


6 


1 A 


B 0 D E F G 


6 


0 1 


2 


1 J 


K L M A B C 


2 


0 


5 


1 J 


ABODE F 


5 


0 2 


3 


1 J 


K L A B C D 


3 


0 


4 


1 J 


K A B C D E 


4 


0 3 


4 


1 J 


K A B C D E 


4 


0 


3 


1 J 


K L A B C D 


3 


0 4 


5 


1 J 


A B C D E F 


5 


0 


2 


1 J 


K L M A B 0 


2 


0 5 


6 


1 A 


B C D E F G 


6 


0 


1 


1 J 


K L M N A B 


1 


0 6 


7 


A B 


0 D E F G H 


7 


0 


0 


1 J 


K L M N 0 A 


0 


0 7 



LEM 
BEM 
Type 
Offset 



0) 



Little-endian memory (BigEndianMem 
BigEndianMem = 1 

AccessT5^e (see Table 2-1) sent to memory 
pAddr2...o sent to memory 



Exceptions: 

TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 

Reserved instruction exception (R4000 in 32-bit mode) 
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SDR 



Store Doubleword Right 



SDR 



31 26 


25 21 


20 16 


15 




0 




SDR 
10 110 1 


base 


rt 


offset 




6 


5 


5 




16 





Format: 

SDR rt, offset(base) 
Description: 

This instruction can be used with the SDL instruction to store the contents 
of a register into eight consecutive bytes of memory, when the bytes cross 
a boundary between two doublewords . SDR stores the right portion of the 
register into the appropriate part of the low-order doubleword; SDL stores 
the left portion of the register into the appropriate part of the low-order 
doubleword of memory. 

The SDR instruction adds its sign-extended 16-bit ojfset to the contents of 
general register base to form a virtual address which may specify an 
arbitrary byte. It alters only the word in memory which contains that byte. 
From one to eight bytes will be stored, depending on the starting byte 
specified. 

Conceptually, it starts at the least-significant (rightmost) byte of the 
register and copies it to the specified byte in memory; then it copies bytes 
from register to memory imtil it reaches the high-order byte of the word in 
memory. No address exceptions due to alignment are possible. 



memory 
(big-endian) 



address 8 


8 


9 


10 


11 


12 


13 


14 


15 


address 0 


0 


1 


2 


3 


4 


5 


6 


7 



register 



before 



A 


B 


C 


D 


E 


F 


G 


H 



memory 
(big-endian) 



address 8 


8 


9 


10 


11 


12 


13 


14 


15 


address 0 


E 


F 


G 


H 


4 


5 


6 


7 



SDR $24,4($0) 



after 
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SDR 



Store Doubleword Right 
(continued) 



SDR 



This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruction 
exception. 



Operation: 

64 T: vAddr ^ ((offsetig)'^^ || offset 15 . o) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr<- pAddrpsizE-i...3 II (pAddr2...o xor ReverseEndian-^) 
If BigEndianMem = 0 then 

pAddr <- pAddrpsizE-3i...3 II 0^ 
endif 

byte <- vAddr^ 0 xoi" BigEndianCPU^ 
data^GPR[rt]63_8*bytellO^*''y'^ 

StoreMemory (uncached, DOUBLEWORD-byte, data, pAddr, vAddr, DATA) 
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SDR 



Store Doubleword Right 
(continued) 



SDR 



Given a doubleword in a register and a doubleword in memory, the 
operation of SDR is as follows: 



SDR 




Register 


A 


B 


C 


D 


E 


F 


G 


H 








Memory 


1 


J 


K 


L 


M 


N 


0 


P 











BigEndianCPU = 0 




BigEndianCPU = 


1 


vAddr2..o 


destination 




offset 


destination 




offset 


type 


LEM 


BEM 


type 


LEIUI BEM 


0 


A 


B C D E F G H 


7 


0 


0 


H J 


K L M N 0 P 


0 


7 0 


1 


B 


0 D E F G H P 


6 


1 


0 


G H 


K L M N 0 P 


1 


6 0 


2 


C 


D E F G H 0 P 


5 


2 


0 


F G 


H L M N 0 P 


2 


5 0 


3 


D 


E F G H N 0 P 


4 


3 


0 


E F 


G H M N 0 P 


3 


4 0 


4 


E 


F G H M N 0 P 


3 


4 


0 


D E 


F G H N 0 P 


4 


3 0 


5 


F 


G H L M N 0 P 


2 


5 


0 


0 D 


E F G H 0 P 


5 


2 0 


6 


G 


H K L M N 0 P 


1 


6 


0 


B 0 


D E F G H P 


6 


1 0 


7 


H 


J K L M N 0 P 


0 


7 


0 


A B 


0 D E F G H 


7 


0 0 



LEM 
BEM 
Type 
Offset 



Little-endian memory (BigEndianMem = 0) 
BigEndianMem = 1 

AccessT5^e (see Table 2-1) sent to memory 
pAddr2...o sent to memory 



Exceptions: 

TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 

Reserved instruction exception (R4000 in 32-bit mode) 
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31 26 


25 21 


20 16 


15 




0 




SH 
10 10 0 1 


base 


rt 


offset 




6 


5 


5 




16 





Format: 

SH rt, offset(base) 
Description: 

The 16-bit ojfset is sign-extended and added to the contents of general 
register base to form an unsigned effective address. The least-significant 
halfword of register rt is stored at the effective address. If the least- 
significant bit of the effective address is non-zero, an address error 
exception occurs. 

Operation: 

32 T: vAddr ^ ((offsetig)^^ || offset-, 5 0) + GPR[base] 

(pAddr, uncached) AddressTranslation (vAddr, DATA) 
pAddr <- pAddrpsizE-i...3 II (pAddr2 0 xo'' (ReverseEndian^ || 0)) 
byte<- vAddr2 0 xor (BigEndianCPU^ || 0) 

data^GPR[rt]63-8*byte...ollO^*^y^^ 

StoreMemory (uncached, HALFWORD, data, pAddr, vAddr, DATA) 

64 T: vAddr^ ((offsetis)'^^ || offsetis . o) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr <- pAddrpsizE-i...3 II (pAddrg ...0 xor (ReverseEndian^ || 0)) 

byte^vAddr2 0 xor (BigEndianCPU^ || 0) 
data^GPR[rt]63-8'byte...ollO^*''y'^ 

StoreMemory (uncached, HALFWORD, data, pAddr, vAddr, DATA) 

Exceptions: 

TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 
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S L L Shift Left Logical S L L 



31 26 


25 21 


20 


16 


15 


11 


10 6 


5 0 




SPECIAL 
000000 


0 

00 0 0 0 


rt 


rd 


sa 


SLL 
000000 




6 


5 


5 




5 




5 


6 



Format: 

SLL rd, rt, sa 

Description: 

The contents of general register rt are shifted left by sa bits, inserting zeros 
into the low-order bits. 

The resiilt is placed in register rd. 

In 64-bit mode, the 32-bit result is sign extended when placed in the 
destination register. It is sign extended for all shift amounts, including 
zero; SLL with a zero shift amount truncates a 64-bit value to 32 bits and 
then sign extends this 32-bit value. SLL, iinlike nearly all other word 
operations, does not require an operand to be a properly sign-extended 
word value to produce a valid sign-extended word result. 

NOTE: SLL with a shift amount of zero may be treated as a NOP by 
some assemblers, at some optimization levels. If using SLL with a 
zero shift to truncate 64-bit values, check the assembler you are using. 

Operation: 

32 T: GPR[rd] ^ GPR[rt]3i_ sa...o 1 1 

64 T: s<-0||sa 

temp^GPR[rt]3i.s...ol|0^ 
GPR[rd] <- (temp3i)32 || temp 



Exceptions: 

None 
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S L L V S^'** I-®** Logical Variable S L L V 



31 26 


25 


21 


20 16 


15 


11 


10 6 


5 0 




SPECIAL 
000000 


rs 


rt 


rd 


0 

0 0 00 0 


SLLV 
000 100 




6 




5 


5 


5 




5 


6 



Formal: 

SLLV rd, rt, rs 

Description: 

The contents of general register rt are shifted left the number of bits 
specified by the low-order five bits contained in general register rs, 
inserting zeros into the low-order bits. 

The result is placed in register rd. 

In 64-bit mode, the 32-bit result is sign extended when placed in the 
destination register. It is sign extended for all shift amounts, including 
zero; SLLV with a zero shift amount truncates a 64-bit value to 32 bits and 
then sign extends this 32-bit value. SLLV, unlike nearly all other word 
operations, does not require an operand to be a properly sign-extended 
word value to produce a valid sign-extended word result. 

NOTE: SLLV with a shift amotmt of zero may be treated as a NOP by 

some assemblers, at some optimization levels. If using SLLV with a 
zero shift to trimcate 64-bit values, check the assembler you are using. 

Operation: 

32 s ^ GP[rs]4 0 

GPR[rd]^GPR[rt](3i_s)...ol|OS 

64 T: s<-0 II GP[rs]4 . o 

temp^GPR[rt](3i.s)...ol|0^ 
GPR[rd] <- (temp3i)32 || temp 



Exceptions: 

None 
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SLT 



Set On Less Than 



SLT 



31 26 


25 


21 


20 16 


15 


11 


10 6 


5 0 




SPECIAL 
000000 


rs 


rt 


rd 


0 

0 0 00 0 


SLT 
10 10 10 




6 




5 


5 


5 




5 


6 



Format: 

SLT rd, rs, rt 

Description: 

The contents of general register rt are subtracted from the contents of 
general register rs. Considering both quantities as signed integers, if the 
contents of general register rs are less than the contents of general register 
rt, the result is set to one; otherwise the result is set to zero. 

The result is placed into general register rd. 

No integer overflow exception occurs under any circumstances. The 
comparison is valid even if the subtraction used during the comparison 
overflows. 



Operation: 



32 



64 



if GPR[rs] 


< GPR[rt] then 




GPR[rd] <- 0 


else 






GPR[rd] <- 0 


endif 




if GPR[rs] 


< GPR[rt] then 




GPR[rd] <- 0 


else 






GPR[rd] ^ 0 


endif 





31 



63 
64 



Exceptions: 

None 
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S LTI ^" "-^^^ Immediate S LTI 



31 26 


25 21 


20 16 


15 




0 




SLTI 
0 0 10 10 


rs 


rt 


immediate 




6 


5 


5 




16 





Format: 

SLTI rt, rs, immediate 

Description: 

The 16-bit immediate is sign-extended and subtracted from the contents of 
general register rs. Considering both quantities as signed integers, if rs is 
less than the sign-extended immediate, the result is set to one; otherwise 
the result is set to zero. 

The result is placed into general register rt. 

No integer overflow exception occurs under any circumstances. The 
comparison is valid even if the subtraction used during the comparison 
overflows. 



Operation: 



32 T: if GPR[rs] 


< (immediate-is)''^ 


1 immediatei5 0 then 




GPR[rd]<-03^ II 


1 


else 








GPR[rd] <- 0^2 




endif 






64 T: if GPR[rs] 


< (immediatei5)^^ 


1 immediatei then 




GPR[rd]<-063|| 


1 


else 








GPR[rd] ^ 0^"^ 




endif 







Exceptions: 

None 
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SLTIU 



Set On Less Than 
Immediate Unsigned 



SLTIU 



31 26 


25 21 


20 16 


15 




0 




SLTIU 
001011 


rs 


rt 


immediate 




6 


5 


5 




16 





Format: 

SLTIU rt, rs, immediate 

Description: 

The 16-bit immediate is sign-extended and subtracted from the contents of 
general register rs. Considering both quantities as unsigned integers, if rs 
is less than the sign-extended immediate, the result is set to one; otherwise 
the result is set to zero. 

The result is placed into general register rt. 

No integer overflow exception occurs under any circumstances. The 
comparison is valid even if the subtraction used during the comparison 
overflows. 

Operation: 



32 T: if (Oil 


GPR[rs]) < (immediatei5)^^ | 


immediatei5 0 then 




GPR[rd]<-03i II 1 




else 








GPR[rd] <- 0^2 




endif 






64 T: if (Oil 


GPR[rs]) < (immediateis)'^^ | 


immediatei5 0 then 




GPR[rd]^063|| 1 




else 








GPR[rd] <- O^"* 




endif 







Exceptions: 

None 



A-M4 



MIPS R4000 Microprocessor User's Manual 



CPU Instruction Set Details 



SLTU Set On Less Than Unsigned SLTU 



31 26 


25 


21 


20 16 


15 


11 


10 6 


5 0 




SPECIAL 
000000 


rs 


rt 


rd 


0 

0 00 0 0 


SLTU 
10 10 11 




6 




5 


5 


5 




5 


6 



Format: 

SLTU rd, rs, rt 

Description: 

The contents of general register rt are subtracted from the contents of 
general register rs. Considering both quantities as unsigned integers, if the 
contents of general register rs are less than the contents of general register 
rt, the result is set to one; otherwise the result is set to zero. 

The resiilt is placed into general register rd. 

No integer overflow exception occurs iinder any circumstances. The 
comparison is valid even if the subtraction used during the comparison 
overflows. 



Operation: 



32 T: if (0 II 


GPR[rs])<0||GPR[rt] then 




GPR[rd]<-03^ II 1 


else 






GPR[rd] ^ 0^2 


endif 




64 T: if (0 1| 


GPR[rs])<0||GPR[rt] then 




GPR[rd]<-063|| 1 


else 






GPR[rd] <- 0^"^ 


endif 





Exceptions: 

None 
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S R A Shift Right Arithmetic S R A 



31 26 


25 21 


20 


16 


15 


11 


10 6 


5 0 




SPECIAL 
000000 


0 

0 0 0 00 


rt 


rd 


sa 


SRA 
0 00 0 1 1 




6 


5 


5 




5 




5 


6 



Format: 

SRA rd, rt, sa 

Description: 

The contents of general register rt are shifted right by sa bits, sign- 
extending the high-order bits. 

The resiilt is placed in register rd. 

In 64-bit mode, the operand must be a valid sign-extended, 32-bit value. 
Operation: 

32 T: GPR[rd]^(GPR[rt]3i)S3||GPR[rt]3i...sa 

64 T: s^O||sa 

temp4-(GPR[rt]3i)S||GPR[rt]3i...s 
GPR[rd] <r- (temp3-|)32 || temp 

Exceptions: 

None 
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SRAV Arithmetic Variable SRAV 



31 26 


25 


21 


20 16 


15 


11 


10 6 


5 0 




SPECIAL 
000000 


rs 


rt 


rd 


0 

0 00 0 0 


SRAV 
000 111 




6 




5 


5 


5 




5 


6 



Formal: 

SRAV rd, rt, rs 

Description: 

The contents of general register rt are shifted right by the number of bits 
specified by the low-order five bits of general register rs, sign-extending 
the high-order bits. 

The resuilt is placed in register rd. 

In 64-bit mode, the operand must be a valid sign-extended, 32-bit value. 
Operation: 

32 T: s <- GPR[rs]4. .0 

GPR[rd] ^ (GPR[rt]3i)^ || GPR[rt]3i...3 

64 T: s <- GPR[rs]4 ...0 

temp^(GPR[rt]3i)=||GPR[rt]3i...s 
GPR[rd] <- (temp3i)32 || tg^p 

Exceptions: 

None 
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S R L Shift Right Logical S R L 



31 26 


25 21 


20 


16 


15 


11 


10 6 


5 0 




SPECIAL 
000000 


0 

0 0 00 0 


rt 


rd 


sa 


SRL 
0000 10 




6 


5 


5 




5 




5 


6 



Formal: 

SRL rd, rt, sa 

Description: 

The contents of general register rt are shifted right by sa bits, inserting 
zeros into the high-order bits. 

The residt is placed in register rd. 

In 64-bit mode, the operand must be a valid sign-extended, 32-bit value. 
Operation: 

32 T: GPR[rd]^0s^||GPR[rt]3i...sa 

64 T: s^O||sa 

temp ^ 0^1 GPR[rt]3i ...s 
GPR[rd] <r- (temp3i)32 \\ temp 

Exceptions: 

None 
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31 26 


25 


21 


20 16 


15 


11 


10 6 


5 0 




SPECIAL 
000000 


rs 


rt 


rd 


0 

0 00 0 0 


SRLV 
00 0 1 1 0 




6 




5 


5 


5 




5 


6 



Formal: 

SRLV rd, rt, rs 

Description: 

The contents of general register rt are shifted right by the number of bits 
specified by the low-order five bits of general register rs, inserting zeros 
into the high-order bits. 

The resuilt is placed in register rd. 

In 64-bit mode, the operand must be a valid sign-extended, 32-bit value. 
Operation: 

32 T: s <- GPR[rs]4_ 0 

GPR[rd]<- 0^ ||GPR[rt]3i...s 

64 T: s <- GPR[rs]4...o 

temp^0S||GPR[rt]3i...s 
GPR[rd] <- (temp3i)32 || temp 

Exceptions: 

None 
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SUB Subtract SUB 



31 26 


25 


21 


20 16 
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10 6 


5 0 




SPECIAL 
000000 


rs 


rt 


rd 


0 

0 00 0 0 


SUB 
10 0 0 10 




6 




5 


5 


5 




5 


6 



Formal: 

SUB rd, rs, rt 

Description: 

The contents of general register rt are subtracted from the contents of 

general register rs to form a result. The result is placed into general 
register rd. In 64-bit mode, the operands must be valid sign-extended, 32- 
bit values. 

The only difference between this instruction and the SUBU instruction is 
that SUBU never traps on overflow. 

An integer overflow exception takes place if the carries out of bits 30 and 

31 differ (2's complement overflow). The destination register rd is not 
modified when an integer overflow exception occurs. 

Operation: 

32 T: GPR[rd] <- GPR[rs] - GPR[rt] 

64 T: temp <- GPR[rs] - GPR[rt] 

GPR[rd] ^ (temp3i)32 || temp3i...o 

Exceptions: 

Integer overflow exception 
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31 26 


25 


21 


20 16 


15 
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10 6 


5 0 




SPECIAL 
000000 


rs 


rt 


rd 


0 

00000 


SUBU 
1000 11 




6 




5 


5 


5 




5 


6 



Format: 

SUBU rd, rs, rt 

Description: 

The contents of general register rt are subtracted from the contents of 
general register rs to form a result. 

The result is placed into general register rd. 

In 64-bit mode, the operands must be valid sign-extended, 32-bit values. 

The only difference between this instruction and the SUB instruction is 
that SUBU never traps on overflow. No integer overflow exception occurs 
under any circumstances. 

Operation: 

32 T: GPR[rd] ^ GPR[rs] - GPR[rt] 

64 T: temp ^ GPR[rs] - GPR[rt] 

GPR[rd]^(temp3i)32||temp3i..o 

Exceptions: 

None 
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S W store Word S W 



31 26 


25 21 


20 16 


15 
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SW 
10 10 11 


base 


rt 


offset 
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5 


5 




16 





Format: 

SW rt, offset(base) 

Description: 

The 16-bit ojfset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of general register rt 
are stored at the memory location specified by the effective address. 

If either of the two least-significant bits of the effective address are non- 
zero, an address error exception occurs. 

Operation: 

32 T: vAddr <r- ((offsetis)''^ || offseti5 ..q) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr<- pAddrpsizE-i...3 II (pAddr2...o xor (ReverseEndian || 0^) 
byte <- vAddrg o xor (BigEndianCPU || 0^) 

data^GPR[rti63-8*bytellO^*'^y'^ 

StoreMemory (uncached, WORD, data, pAddr, vAddr, DATA) 
64 T: vAddr <- ((offsetis)"^^ || offseti5,..o) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr 4- pAddrpsizE-i...3 II (pAddr2...o xor (ReverseEndian || 0^) 
byte ^ vAddr2 o xor (BigEndianCPU || 0^) 
data^GPR[rt]63-8*bytellO«*^y^^ 

StoreMemory (uncached, WORD, data, pAddr, vAddr, DATA) 



Exceptions: 

TLB refill exception TLB invalid exception 

TLB modification exception Bus error exception 
Address error exception 
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31 26 
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20 16 


15 




0 




SWCz 
1 1 1 0 X X* 


base 


rt 


offset 




6 


5 


5 




16 





Format: 

SWCz rt, offset(base) 
Description: 

The 16-bit ojfset is sign-extended and added to the contents of general 

register base to form a virtual address. Coprocessor unit z sources a word, 
which the processor writes to the addressed memory location. 

The data to be stored is defined by individual coprocessor specifications. 

This instruction is not valid for use with CPO. 

If either of the two least-significant bits of the effective address is non-zero, 
an address error exception occurs. 

Operation: 

32 T: vAddr ^ ((offsetig)^^ H offsetis . o) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr^ pAddrpsizE-i...3 II (pAddr2 o ^or (ReverseEndian || 0^) 
byte <- vAddr2...o xor (BigEndianCPU || 0^) 
data <- COPzSW (byte, rt) 

StoreMemory (uncached, WORD, data, pAddr, vAddr, DATA) 

64 T: vAddr^ ((offsetis)"^^ || offsetig . o) + GPR[base] 

(pAddr, uncached) ^ AddressTranslation (vAddr, DATA) 
pAddr <r- pAddrpsi2E-i...3 II (pAddr2...o xor (ReverseEndian || 0^) 
byte <- vAddr2...o xor (BigEndianCPU || 0^) 
data <r- COPzSW (byte.rt) 

StoreMemory (uncached, WORD, data, pAddr, vAddr DATA) 



*See the table "Opcode Bit Encoding" on next page, or "CPU Instruction 
Opcode Bit Encoding" at the end of Appendix A. 
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SWCz 



Store Word From Coprocessor 
(Continued) 



SWCz 



Exceptions: 

TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 
Coprocessor unusable exception 



Opcode Bit Encoding: 



SWCz Bit #31 


30 


29 


28 


27 


26 




0 


SWC1 


1 


1 


1 


0 


0 


1 






Bit #31 


30 


29 


28 


27 


26 




0 


SWC2 


1 


1 


1 


0 


1 


0 


























SW opcode 


Coprocessor Unit Number 
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25 21 


20 16 


15 




0 




SWL 
10 10 10 


base 


rt 


offset 




6 


5 


5 




16 





Format: 

SWL rt, o£fset(base) 
Description: 

This instruction can be used with the SWR instruction to store the contents 
of a register into four consecutive bytes of memory, when the bytes cross 
a word boundary. SWL stores the left portion of the register into the 
appropriate part of the high-order word of memory; SWR stores the right 
portion of the register into the appropriate part of the low-order word. 

The SWL instruction adds its sign-extended 16-bit offset to the contents of 
general register base to form a virtual address which may specify an 
arbitrary byte. It alters only the word in memory which contains that byte. 
From one to four bytes will be stored, depending on the starting byte 
specified. 

Conceptually, it starts at the most-significant byte of the register and 
copies it to the specified byte in memory; then it copies bytes from register 
to memory imtil it reaches the low-order byte of the word in memory. 

No address exceptions due to alignment are possible. 



address 4 
address 0 



memory 
(big-endian) 



4 


5 


6 


7 


0 


1 


2 


3 



register 



before 



A 


B 


C 


D 



$24 



address 4 


4 


5 


6 


7 


address 0 


0 


A 


B 


C 



SWL $24,1 ($0) 



after 
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QIA/I Store Word Left QWI 

(Continued) OVVI- 



operation: 

32 T: vAddr ^ ((offsetig)^^ || offset 15 . 0) + GPR[base] 

(pAddr, uncached) AddressTranslation (vAddr, DATA) 
pAddr<- pAddrpsizE-i...3 II (pAddr2. .o xor ReverseEndian-^) 
If BigEndianMem = 0 then 

pAddr <- pAddrgi 2 II 0^ 
endif 

byte <- vAddr-i 0 xor BigEndianCPU^ 

if (vAddr2 xor BigEndianCPU) = 0 then 

data ^ II o24-8*byte || GPR[rt]3i...24-8*byte 

else 

data ^ 024-8*byte || GPR[rt]3i...24-8*byte II 

endif 

Storememory (uncached, byte, data, pAddr, vAddr, DATA) 

64 T: vAddr ^ ((offsetig)"^^ || offset 15...0) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr^ pAddrpsizE- 1...3 II (pAddr2...o xor ReverseEndian-^) 
If BigEndianMem = 0 then 

pAddr <r- pAddr3i 2 II 0^ 
endif 

byte ^ vAddr-i 0 xor BigEndianCPU^ 
if (vAddr2 xor BigEndianCPU) = 0 then 

data ^ II 024-8*byte II GPR[rt]3i...24-8*byte 

else 

data ^ 024-8*byte || GPR[rt]3i...24-8*byte II 

endif 

StoreMemory(uncached, byte, data, pAddr, vAddr, DATA) 
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SWL 



Store Word Left 
(Continued) 



SWL 



Given a doubleword in a register and a doubleword in memory, the 
operation of SWL is as follows: 



SWL 




Register 


A 


B 


C 


D 


E 


F 


G 


H 








Memory 
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LEM 
BEM 
Type 
Offset 



0) 



Little-endian memory (BigEndianMem 
BigEndianMem = 1 

AccessType (see Table 2-1) sent to memory 
pAddr2...o sent to memory 



Exceptions: 

TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 
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SWR ^^^^ ^^'9*^* SWR 
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offset 
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Format: 

SWR rt, offset(base) 
Description: 



This instruction can be used with the SWL instruction to store the contents 
of a register into four consecutive bytes of memory, when the bytes cross 
a boimdary between two words. SWR stores the right portion of the 
register into the appropriate part of the low-order word; SWL stores the 
left portion of the register into the appropriate part of the low-order word 
of memory. 

The SWR instruction adds its sign-extended 16-bit offset to the contents of 
general register base to form a virtual address which may specify an 
arbitrary byte. It alters only the word in memory which contains that byte. 
From one to four bytes will be stored, depending on the starting byte 
specified. 

Conceptually, it starts at the least-significant (rightmost) byte of the 
register and copies it to the specified byte in memory; then copies bytes 
from register to memory imtil it reaches the high-order byte of the word in 
memory. 

No address exceptions due to alignment are possible. 



address 4 
address 0 



address 4 
address 0 
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SWR ''Tco— r* SWR 



Operation: 



32 T: vAddr ^((offsetis)^^ II offset 15 o) + GPR[base] 

(pAddr, uncached) <r- AddressTranslation (vAddr, DATA) 
pAddr<- pAddrpsizE-i...3 II (pAddr2...o xor ReverseEndian'^) 
If BigEndianMem = 0 then 

pAddr <- pAddrsi 2 II 0^ 
endif 

byte <- vAddr-i Q xor BigEndianCPU^ 
if (vAddr2 xor BigEndianCPU) = 0 then 
data ^ II GPR[rt]3i.8*byte...o II 0^''°^^^ 

else 

data^ GPR[rt]3i.8*byte...O II 0^**^^® || 0^^ 
endif 

Storememory (uncached, WORD-byte, data, pAddr, vAddr, DATA) 

64 T: vAddr ^((offsetis)'*^ II offset 15. o) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr<- pAddrpsi2E-i...3 II (pAddr2. .0 xor ReverseEndian"^) 
If BigEndianMem = 0 then 

pAddr <- pAddr3i 2 II 0^ 
endif 

byte <- vAddri 0 xor BigEndianCPU^ 
if (vAddr2 xor BigEndianCPU) = 0 then 
data^ II GPR[rt]3i.8.byte...o II O^'^'y'^ 

else 

data ^ GPR[rt]3i.8*byte...o II 0^*^^^ II O^^ 
endif 

StoreMemory(uncached, WORD-byte, data, pAddr, vAddr, DATA) 



MIPS R4000 Microprocessor User's Manual 



A-159 



Appendix A 



SWR 



Store Word Right 
(Continued) 



SWR 



Given a doubleword in a register and a doubleword in memory, the 
operation of SWR is as follows: 



SWR 
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LEM 
BEM 
Type 
Offset 



Little-endian memory (BigEndianMem = 0) 
BigEndianMem ~ 1 

AccessType (see Table 2-1) sent to memory 
pAddr2 0 sent to memory 



Exceptions: 

TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 
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SYNC Synchronize SYNC 
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SPECIAL 
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0 00 0 0 0 00 0 0 00 0 0 0 0 00 0 0 


SYNC 
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Format: 

SYNC 
Description: 

The SYNC instruction ensures that any loads and stores fetched prior to the 
present instruction are completed before any loads or stores after this 
instruction are allowed to start. Use of the SYNC instruction to serialize 
certain memory references may be required in a multiprocessor 
environment for proper synchronization. For example: 



Processor A 


Processor B 


SW R1 , DATA 
LI R2, 1 
SYNC 

SW R2, FLAG 


1 : LW R2, FLAG 
BEQ R2, RO, 1 B 
NOP 
SYNC 

LW R1 , DATA 



The SYNC in processor A prevents DATA being written after FLAG, 
which could cause processor B to read stale data. The SYNC in processor 
B prevents DATA from being read before FLAG, which could likewise 

result in reading stale data. For processors which only execute loads and 
stores in order, with respect to shared memory, this instruction is a NOP. 

LL and SC instructions implicitly perform a SYNC. 

This instruction is allowed in User mode. 
Operation: 

32, 64 T: SyncOperationQ 

Exceptions: 

None 
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SYSCALL system Call SYSCALL 
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Format: 

SYSCALL 

Description: 

A system call exception occurs, immediately and unconditionally 
transferring control to the exception handler. 

The code field is available for use as software parameters, but is retrieved 
by the exception handler only by loading the contents of the memory word 
containing the instruction. 

Operation: 

32, 64 T: SystemCallException 

Exceptions: 

System Call exception 
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Format: 

TEQ rs, rt 

Description: 

The contents of general register rt are compared to general register rs. If 
the contents of general register rs are equal to the contents of general 
register rt, a trap exception occurs. 

The code field is available for use as software parameters, but is retrieved 
by the exception handler only by loading the contents of the memory word 
containing the instruction. 

Operation: 

32, 64 T: if GPR[rs] = GPR[rt] then 
TrapException 
endif 

Exceptions: 

Trap exception 
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TEQI Trap If Equal Immediate TEQI 
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Format: 

TEQI rs, immediate 

Description: 

The 16-bit immediate is sign-extended and compared to the contents of 

general register rs. If the contents of general register rs are equal to the 
sign-extended immediate, a trap exception occurs. 

Operation: 



32 T: if GPR[rs] = (immediate-is)''^ || immediate-is othen 
TrapException 
endif 

64 T: if GPR[rs] = (immediate-is)^^ || immediate-is othen 
TrapException 
endif 



Exceptions: 

Trap exception 
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Trap If Greater Than Or Equal 
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Format: 

TGE rs, rt 

Description: 

The contents of general register rt are compared to the contents of general 

register rs. Considering both quantities as signed integers, if the contents 
of general register rs are greater than or equal to the contents of general 
register rt, a trap exception occurs. 

The code field is available for use as software parameters, but is retrieved 
by the exception handler only by loading the contents of the memory word 
containing the instruction. 



Operation: 



32,64 T: ifGPR[rs] > GPR[rt] then 
TrapException 
endif 



Exceptions: 

Trap exception 
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Format: 

TGEI rs, immediate 

Description: 

The 16-bit immediate is sign-extended and compared to the contents of 
general register rs. Considering both quantities as signed integers, if the 
contents of general register rs are greater than or equal to the sign- 
extended immediate, a trap exception occurs. 

Operation: 



32 


T: if GPR[rs] > (immediate-is)''^ | 


immediatei5 0 then 




TrapException 






endif 




64 


T: if GPR[rs]>(immediatei5)''^| 


immediatei5 0 then 




TrapException 






endif 





Exceptions: 

Trap exception 
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I C I U Immediate Unsigned I C I U 



31 26 


25 


21 


20 16 


15 




0 




REGIMM 
000001 


rs 


TGEIU 
0 1001 


immediate 




6 




5 


5 




16 







Format: 

TGEIU rs, immediate 

Description: 

The 16-bit immediate is sign-extended and compared to the contents of 
general register rs. Considering both quantities as unsigned integers, if the 
contents of general register rs are greater than or equal to the sign- 
extended immediate, a trap exception occurs. 

Operation: 



32 


T: if (0||GPR[rs])>(0| 


1 (immediatei5)''^ || immediatei5 _,o) then 




TrapException 






endif 




64 


T: if (0||GPR[rs])>(0| 


1 (immediate-, s)'*^ || immediatei5 q) then 




TrapException 






endif 





Exceptions: 

Trap exception 
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Format: 

TGEU rs, rt 

Description: 

The contents of general register rt are compared to the contents of general 

register rs. Considering both quantities as unsigned integers, if the 
contents of general register rs are greater than or equal to the contents of 
general register rt, a trap exception occurs. 

The code field is available for use as software parameters, but is retrieved 
by the exception handler only by loading the contents of the memory word 
containing the instruction. 

Operation: 

T: if (0 II GPR[rs]) > (0 || GPR[rt]) then 
Trap Exception 
endif 



Exceptions: 

Trap exception 
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Format: 

TLBP 
Description: 

The Index register is loaded with the address of the TLB entry whose 
contents match the contents of the EntryHi register. If no TLB entry 
matches, the high-order bit of the Index register is set. 

The architecture does not specify the operation of memory references 
associated with the instruction iiranediately after a TLBP instruction, nor 
is the operation specified if more than one TLB entry matches. 
Operation: 



32 T: 


Index^ 1 II 0^5 || undefined^ 




for i in 0...TLBEntries-1 




if (TLB[i]95 77= EntryHigi 12) and (TLB[i]76 or 




(TLB[i]7i „.64=EntryHi7 .o)) then 




Index 0^6 || l^ j^ 




endif 




endfor 


64 T: 


Index^ 1 II 0 || undefined^ 




for i in O...TLBEntries-1 




if (TLB[i]i67...i4i and not (0^5 || TLB[i]2i6...205)) 




= EntryHisg 13) and not (0''^ || TLB[i]2i6...205)) and 




(TLB[i]i4o or (TLB[i]i35 .128 = EntryHi7...o)) then 




Index ^ 0^^ || i5...o 




endif 




endfor 



Exceptions: 

Coprocessor unusable exception 
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Format: 

TLBR 

Description: 

The G bit (which controls ASID matching) read from the TLB is written 
into both of the EntryLoO and EntryLol registers. 

The EntryHi and EntryLo registers are loaded with the contents of the TLB 
entry pointed at by the contents of the TLB Index register. The operation 
is invalid (and the results are unspecified) if the contents of the TLB Index 
register are greater than the number of TLB entries in the processor. 



Operation: 



32 


T: PageMask<- TLB[lndex5 o]i27...96 




EntryHi <- TLB[lndex5 o]95... 64 not TLB[lndex5 0)127.. .96 




EntryLol <-TLB[lndex5 o]63...32 




EntryLoO <- TLB[lndex5 o]3i...o 


64 


T: PageMask^ TLB[lndex5 glass.. .192 




EntryHi ^ TLB[lndex5 o]i9i...i28 and not TLB[lndex5 .o]255...i92 




EntryLol ^-TLBflndexg .o]i27...6S II TLBflndexg qIuo 




EntryLoO <- TLB[lndex5 o]63...i II TLBflndexg qIuo 



Exceptions: 

Coprocessor unusable exception 



A-170 



MIPS R4000 Microprocessor User's Manual 



CPU Instruction Set Details 



TLBWI Write Indexed TLB Entry TLBWI 



31 26 


25 


24 


6 


5 0 




COPO 
0 10 0 0 0 


CO 
1 


0 

0 0 0 0 00 0 0 0 0 0 0 00 0 00 0 0 


TLBWI 
0000 10 




6 


1 


19 




6 



Format: 

TLBWI 

Description: 

The G bit of the TLB is written with the logical AND of the G bits in the 
EntryLoO and EntryLol registers. 

The TLB entry pointed at by the contents of the TLB Index register is loaded 
with the contents of the EntryHi and EntryLo registers. 

The operation is invalid (and the results are unspecified) if the contents of 
the TLB Index register are greater than the number of TLB entries in the 
processor. 

Operation: 

32, 64T: TLBflndexg q] <- 

PageMask || (EntryHi and not PageMask) || EntryLol || EntryLoO 



Exceptions: 

Coprocessor unusable exception 
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Format: 

TLBWR 

Description: 

The G bit of the TLB is written with the logical AND of the G bits in the 
EntryLoO and EntryLol registers. 

The TLB entry pointed at by the contents of the TLB Random register is 
loaded with the contents of the Entn/Hi and EntryLo registers. 

Operation: 

32, 64T: TLBfRandomg q] <- 

PageMask || (EntryHi and not PageMask) || EntryLol || EntryLoO 



Exceptions: 

Coprocessor unusable exception 
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Format: 

TLT rs, rt 

Description: 

The contents of general register rt are compared to general register rs. 
Considering both quantities as signed integers, if the contents of general 
register rs are less than the contents of general register rt, a trap exception 
occurs. 

The code field is available for use as software parameters, but is retrieved 
by the exception handler only by loading the contents of the memory word 
containing the instruction. 

Operation: 



32,64 T: if GPR[rs] < GPR[rt] then 
TrapException 
endif 



Exceptions: 

Trap exception 
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Format: 

TLTI rs, inraiediate 

Description: 

The 16-bit immediate is sign-extended and compared to the contents of 

general register rs. Considering both quantities as signed integers, if the 
contents of general register rs are less than the sign-extended immediate, a 
trap exception occurs. 

Operation: 



32 


T: if GPR[rs] < (immediateig)''^ | 


immediatei5 0 then 




Trap Exception 






endif 




64 


T: if GPR[rs] < (immediateis)"*^ | 


immediatei5 0 then 




Trap Exception 






endif 





Exceptions: 

Trap exception 
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Format: 

TLTIU rs, immediate 

Description: 

The 16-bit immediate is sign-extended and compared to the contents of 

general register rs. Considering both quantities as signed integers, if the 
contents of general register rs are less than the sign-extended immediate, a 
trap exception occurs. 



Operation: 



32 


T: if (0 II GPR[rs]) < (0 | 


1 (immediate-15)^^ | 


immediatei5, _o) then 




TrapException 








endif 






64 


T: if (0 II GPR[rs]) < (0 | 


1 (immediate-is)'*^ | 


immediatei5 o) then 




TrapException 








endif 







Exceptions: 

Trap exception 
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"TLTU Trap If Less Than Unsigned TLTU 
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Format: 

TLTU rs, rt 

Description: 

The contents of general register rt are compared to general register rs. 
Considering both quantities as unsigned integers, if the contents of 
general register rs are less than the contents of general register rt, a trap 
exception occurs. 

The code field is available for use as software parameters, but is retrieved 
by the exception handler only by loading the contents of the memory word 
containing the instruction. 

Operation: 

32, 64T: if (0 || GPR[rs]) < (0 || GPR[rt]) then 
TrapException 
endif 

Exceptions: 

Trap exception 
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J ^ £ Trap If Not Equal J ^ £ 



31 26 


25 21 


20 16 


15 6 


5 0 




SPECIAL 


rs 


rt 


code 


TNE 






000000 








110 110 




6 


5 


5 


10 


6 



Format: 

TNE rs, rt 

Description: 

The contents of general register rt are compared to general register rs. If 
the contents of general register rs are not equal to the contents of general 
register rt, a trap exception occurs. 

The code field is available for use as software parameters, but is retrieved 
by the exception handler only by loading the contents of the memory word 
containing the instruction. 

Operation: 

32, 647: if GPR[rs] GPR[rt] then 
TrapException 
endif 



Exceptions: 

Trap exception 
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TNEI Trap If Not Equal Immediate TNEI 



31 26 


25 


21 


20 16 


15 




0 




REGIMM 
000001 


rs 


TNEI 
0 1110 


immediate 




6 




5 


5 




16 







Format: 

TNEI rs, immediate 

Description: 

The 16-bit immediate is sign-extended and compared to the contents of 
general register rs. If the contents of general register rs are not equal to the 
sign-extended immediate, a trap exception occurs. 

Operation: 

32 T: if GPR[rs] ?!: (immediateis)^^ II immediatei5 0 then 
TrapException 

endif 

64 T: if GPR[rs] i= (immediateis)^^ || immediate^s q then 
TrapException 
endif 



Exceptions: 

Trap exception 
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XOR Exclusive Or XOR 



31 26 25 21 20 16 15 11 10 6 5 0 



SPECIAL 


rs 


rt 


rd 


0 


XOR 


000000 








0 0 00 0 


10 0 110 


6 


5 


5 


5 


5 


6 



Format: 

XOR rd, rs, rt 

Description: 

The contents of general register rs are combined with the contents of 
general register rt in a bit-wise logical exclusive OR operation. 

The result is placed into general register rd. 
Operation: 



32, 64 T: GPR[rd] <- GPR[rs] xor GPR[rt] 



Exceptions: 

None 
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XORI Exclusive OR Immediate XORI 



31 26 


25 21 


20 16 


15 




0 




XORI 
0 0 1110 


rs 


rt 


immediate 




6 


5 


5 




16 





Format: 

XORI rt, rs, immediate 

Description: 

The 16-bit immediate is zero-extended and combined with the contents of 
general register rs in a bit-wise logical exclusive OR operation. 

The residt is placed into general register rt. 
Operation: 

32 T: GPR[rt] <-GPR[rs]xor (0^6 II immediate) 
64 T: GPR[rt] <-GPR[rs]xor(0'^s II immediate) 



Exceptions: 

None 
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CPU Instruction Opcode Bit Encoding 

The remainder of this Appendix presents the opcode bit encoding for the 
CPU instruction set (ISA and extensions), as implemented by the R4000. 
Figure A-2 lists the R4000 Opcode Bit Encoding. 
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Figure A-2 R4000 Opcode Bit Encoding 
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Figure A-2 (cont.) R4000 Opcode Bit Encoding 



Key: 

* Operation codes marked with an asterisk cause reserved 
instruction exceptions in all current implementations and are 
reserved for future versions of the architecture. 

Y Operation codes marked with a gamma cause a reserved 

instruction exception. They are reserved for future versions of the 
architecture. 

5 Operation codes marked with a delta are valid only for R4000 
processors with CPO enabled, and cause a reserved instruction 
exception on other processors. 

(|) Operation codes marked with a phi are invalid but do not cause 
reserved instruction exceptions in R4000 implementations. 

^ Operation codes marked with a xi cause a reserved instruction 
exception on R4000 processors. 

X Operation codes marked with a chi are valid only on R4000. 

8 Operation codes marked with epsilon are valid when the processor 
is operating either in the Kernel mode or in the 64-bit non-Kernel 
(User or Supervisor) mode. These instructions cause a reserved 
instruction exception if 64-bit operation is not enabled in User or 
Supervisor mode. 
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B 



This appendix provides a detailed description of each floating-point unit 
(FPU) instruction (refer to Appendix A for a detailed description of the 
CPU instructions). The instructions are listed alphabetically, and any 
exceptions that may occur due to the execution of each instruction are 
listed after the description of each instruction. Descriptions of the 
immediate causes and the manner of handling exceptions are omitted 
from the instruction descriptions in this appendix (refer to Chapter 7 for 
detailed descriptions of floating-point exceptions and handling). 

Figure B-3 at the end of this appendix lists the entire bit encoding for the 
constant fields of the floating-point instruction set; the bit encoding for 
each instruction is included with that individual instruction. 
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B.l Instruction Formats 

There are three basic instruction format t5rpes: 

• I-Type, or Immediate instructions, which include load and 

store operations 

• M-Type, or Move instructions 

• R-Type, or Register instructions, which include the two- 
and three-register floating-point operations. 

The instruction description subsections that follow show how these three 
basic instruction formats are used by: 

• Load and store instructions 

• Move instructions 

• Floating-Point computational instructions 

• Floating-Point branch instructions 

Floating-point instiuctions are mapped onto the MIPS coprocessor 
instructions, defining coprocessor unit number one (CPl) as the floating- 
point unit. 

Each operation is valid only for certain formats. Implementations may 
support some of these formats and operations through emulation, but 
they only need to support combinations that are valid (marked V in Table 
B-1). Combinations marked R in Table B-1 are not currently specified by 
this architecture, and cause an unimplemented operation trap. They will 
be available for future extensions to the architecture. 
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Table B-1 Valid FPU Instruction Formats 



Operation 


Source Format 


Single 


Double 


Word 


Longword 


ADD 


V 


V 


R 


R 


SUB 


V 


V 


R 


R 


MUL 


V 


V 


R 


R 


DIV 


V 


V 


R 


R 


SQRT 


V 


V 


R 


R 


ABS 


V 


V 


R 


R 


MOV 


V 


V 






NEG 


V 


V 


R 


R 


TRUNC.L 


V 


V 






ROUND.L 


V 


V 






CEIL.L 


V 


V 






FLOOR.L 


V 


V 






TRUNC.W 


V 


V 






ROUND.W 


V 


V 






CEIL.W 


V 


V 






FLOOR.W 


V 


V 






CVT.S 




V 


V 


V 


CVT.D 


V 




V 


V 


CVT.W 


V 


V 






CVT.L 


V 


V 






C 


V 


V 


R 


R 
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The coprocessor branch on condition true/false instructions can be used 
to logically negate any predicate. Thus, the 32 possible conditions require 
only 16 distinct comparisons, as shown in Table B-2 below. 



Table B-2 Logical Negation of Predicates by Condition True/False 



Condition 


Relations 


Invalid 
Operation 

Exception If 
Unordered 


Mnemonic 


Code 


Greater 

"Til 

Than 


Less 

"Til 

Than 


Equal 


Unordered 


True 


False 


F 


T 


0 


F 


F 


F 


F 


No 


UN 


OR 


1 


F 


F 


F 


T 


No 


EQ 


NEQ 


2 


F 


F 


T 


F 


No 


UEQ 


OGL 


3 


F 


F 


T 


T 


No 


OLT 


UGE 


4 


F 


T 


F 


F 


No 


ULT 


OGE 


5 


F 


T 


F 


T 


No 


OLE 


UGT 


6 


F 


T 


T 


F 


No 


ULE 


OGT 


7 


F 


T 


T 


T 


No 


SF 


ST 


8 


F 


F 


F 


F 


Yes 


NGLE 


GLE 


9 


F 


F 


F 


T 


Yes 


SEQ 


SNE 


10 


F 


F 


T 


F 


Yes 


NGL 


GL 


11 


F 


F 


T 


T 


Yes 


LT 


NLT 


12 


F 


T 


F 


F 


Yes 


NGE 


GE 


13 


F 


T 


F 


T 


Yes 


LE 


NLE 


14 


F 


T 


T 


F 


Yes 


NOT 


GT 


15 


F 


T 


T 


T 


Yes 
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Floating-Point Loads, Stores, and Moves 

All movement of data between the floating-point coprocessor and 
memory is accomplished by coprocessor load and store operations, which 
reference the floating-point coprocessor General Purpose registers. These 
operations are imformatted; no format conversions are performed and, 
therefore, no floating-point exceptions can occiir due to these operations. 

Data may also be directly moved between the floating-point coprocessor 
and the processor by move to coprocessor and move from coprocessor 
instructions. Like the floating-point load and store operations, move to/ 
from operations perform no format conversions and never cause floating- 
point exceptions. 

An additional pair of coprocessor registers are available, called Floating- 
Point Control registers for which the only data movement operations 
supported are moves to and from processor General Purpose registers. 

Floating-Point Operations 

The floating-point unit operation set includes: 

• floating-point add 

• floating-point subtract 

• floating-point multiply 

• floating-point divide 

• floating-point square root 

• convert between fixed-point and floating-point formats 

• convert between floating-point formats 

• floating-point compare 

These operations satisfy the requirements of IEEE Standard 754 
requirements for accuracy. Specifically, these operations obtain a result 
which is identical to an infinite-precision result rounded to the specified 
format, using the current rounding mode. 

Instructions must specify the format of their operands. Except for 
conversion functions, mixed-format operations are not provided. 
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B.2 Instruction Notation Conventions 

In this appendix, all variable subfields in an instruction format (such as/s, 
ft, immediate, and so on) are shown in lower-case. The instruction name 
(such as ADD, SUB, and so on) is shown in upper-case. 

For the sake of clarity, we sometimes use an alias for a variable subfield in 
the formats of specific instructions. For example, we use rs = base in the 
format for load and store instructions. Such an alias is always lower case, 
since it refers to a variable subfield. 

In some instructions, the instruction subfields op and function can have 
constant 6-bit values. When reference is made to these instructions, 
upper-case mnemonics are used. For instance, in the floating-point ADD 
I instruction we use op = COPl and function = ADD. In other cases, a single 

field has both fixed and variable subfields, so the name contains both 
upper and lower case characters. Bit encodings for mnemonics are shown 
in Figure B-3 at the end of this appendix, and are also included with each 
individual instruction. 

In the instruction description examples that follow, the Operation section 
describes the operation performed by each instruction using a high-level 
language notation. 

Instruction Notation Examples 

The following examples illustrate the application of some of the 
instruction notation conventions: 

Example #1 : 

GPR[rt] <- immediate II 0^^ 

Sixteen zero bits are concatenated with an immediate 
value (typically 16 bits), and the 32-bit string (with the lower 

16 bits set to zero) is assigned to General Purpose Register rt. 

Example #2: 

(immediate-is)^^ || immediatei5 g 

Bit 15 (the sign bit) of an immediate value is extended for 
1 6 bit positions, and the result is concatenated with bits 1 5 
through 0 of the immediate value to form a 32-bit sign 
extended value. 
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6.3 Load and Store Instructions 

In the R4000 implementation, the instruction immediately following a 
load may use the contents of the register being loaded. In such cases, the 
hardware interlocks, requiring additional real cycles, so scheduling load 
delay slots is still desirable, although not required for functional code. 

The behavior of the load store instructions is dependent on the width of 
the FGRs. 

• When the FR bit in the Status register equals zero, the Floating- 
Point General registers (FGRs) are 32-bits wide. 

• When the FR bit in the Status register equals one, the Floating- 
Point General registers (FGRs) are 64-bits wide. 

In the load and store operation descriptions, the functions listed in 
Table B-3 are used to summarize the handling of virtual addresses and 
physical memory. 



Table B-3 Load and Store Common Functions 



Function 


Meaning 


AddressTranslation 


Uses the TLB to find the physical address given the virtual 
address. The fimction fails and an exception is taken if the 
required translation is not present in the TLB. 


LoadMemory 


Uses the cache and main memory to find the contents of 

the word containing the specified physical address. The 
low-order two bits of the address and the Access Type field 
indicates which of each of the four bytes within the data 
word need to be returned. If the cache is enabled for this 
access, the entire word is returned and loaded into the 
cache. 


StoreMemory 


Uses the cache, write buffer, and main memory to store the 
word or part of word specified as data in the word 
containing the specified physical address. The low-order 
two bits of the address and the Access Type field indicates 
which of each of the four bytes within the data word 
should be stored. 
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Figure B-1 shows the I-Type instruction format used by load and store 
operations. 

I-Type (Immediate) 



31 




26 


25 21 


20 


16 


15 




0 


op 


base 


ft 


offset 




6 




5 




5 




16 





op is a 6-bit operation code 

base is the 5-bit base register specifier 

^ is a 5-bit source (for stores) or destination (for loads) FPA register 
specifier 

offset is the 1 6-bit signed immediate offset 

Figure B-1 Load and Store Instruction Format 



I All coprocessor loads and stores reference aligned data items. Thus, for 

word loads and stores, the access type field is always WORD, and the low- 
order two bits of the address must always be zero. 

For doubleword loads and stores, the access type field is always 
DOUBLEWORD, and the low-order three bits of the address must always 

be zero. 

Regardless of byte-numbering order (endianness), the address specifies 
that byte which has the smallest byte-address in the addressed field. For 
a big-endian machine, this is the leftmost byte; for a little-endian machine, 
this is the rightmost byte. 
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B.4 Computational Instructions 

Computational instructions include all of the arithmetic floating-point 
operations performed by the FPU. 

Figure B-2 shows the R-Type instruction format used for computational 
operations. 



R-Type (Register) 



31 


26 


25 21 


20 


16 


15 


11 


10 


6 


5 0 


C0P1 


fmt 


ft 


fs 


fd 


function | 


6 


5 


5 




5 




5 




6 



C0P1 is a 6-bit operation code 

fmt is a 5-bit format specifier 

fs is a 5-bit sourcel register 

ft is a 5-bit source2 register 

fd is a 5-bit destination register 

function is a 6-bit function field 

Figure B-2 Computational Instruction Format 

The function field indicates the floating-point operation to be performed. 

Each floating-point instruction can be applied to a number of operand 
formats. The operand format for an instruction is specified by the 5-bit 
format field; decoding for this field is shown in Table B-4. 



Table B-4 Format Field Decoding 



Code 


Mnemonic 


Size 


Format 


16 


S 


single 


Binary floating-point 


17 


D 


double 


Binary floating-point 


18 


Reserved 


19 


Reserved 


20 


W 


single 


32-bit binary fixed-point 


21 


L 


longword 


64-bit binary fixed-point 


22-31 


Reserved 



Table B-5 lists all floating-point instructions. 
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Table B-5 Floating-Point Instructions and Operations 



Code 
(5: 0) 


A^Tl rkTl 1 <^ 


LlClctUlUXI 


0 


ADD 


Add 


1 


SUB 


Subtract 


2 


MUL 


Multiply 


3 


DIV 


Divide 


4 


SQRT 


Square root 


5 


ABS 


Absolute value 


6 


MOV 


Move 


7 


NEG 


Negate 


a 

o 


RDTTNTD T 


Convert to 64-bit (long) fixed-point, rounded to nearest/ 
even 


9 


TRUNC.L 


Convert to 64-bit (long) fixed-point, rounded toward zero 


10 


CEIL.L 


Convert to 64-bit (long) fixed-point, rounded to +«> 


11 


FLOOR.L 


Convert to 64-bit (long) fixed-point, roiinded to -«> 


12 


ROUND.W 


Convert to single fixed-point, rounded to nearest/ even 


13 


TRUNC.W 


Convert to single fixed-point, rounded toward zero 


14 


CEIL.W 


Convert to single fixed-point, rounded to + °o 


15 


FLOOR.W 


Convert to single fixed-point, rounded to - «> 


16-31 




Reserved 


32 


CVT.S 


Convert to single floating-point 


33 


CVT.D 


Convert to double floating-point 


34 




Reserved 


35 




Reserved 


36 


CVT.W 


Convert to 32-bit binary fixed-point 


37 


CVT.L 


Convert to 64-bit (long) binary fixed-point 


38^7 




Reserved 


48-63 


C 


Floating-point compare 
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In the following pages, the notation PGR refers to the 32 General Purpose 
registers PGRO through PGR31 of the FPU, and PPR refers to the floating- 
point registers of the FPU. 

• When the PR bit in the Status register (SR(26)) equals zero, only 
the even floating-point registers are valid and the 32 General 
Purpose registers of the FPU are 32-bits wide. 

• When the PR bit in the Status register (SR(26)) equals one, both 
odd and even floating-point registers may be used and the 32 
General Purpose registers of the FPU are 64-bits wide. 

The following routines are used in the description of the floating-point 
operations to retrieve the value of an FPR or to change the value of an PGR: 



value <— ValueFPR(fpr,fmt) 

if SR26 = 1 then /* 64-bit wide FGRs */ 
case fmt of 
S, W: 

value <— FGR[fpr]3i 0 
return 
D, L: 

value <— FGR[fpr] 
return 
endcase 

elseif fprg = 0 then /* valid specifier, 32-bit wide FGRs 7 
case fmt of 
S, W: 

value <— FGR[fpr] 
return 
D, L: 

value <- FGR[fpr+1] || FGR[fpr] 
return 

endcase 

else /* undefined result for odd 32-bit reg #s */ 

value <— undefined 
endif 
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StoreFPR(fpr, fmt, value) 

if SR26 = 1 then /* 64-bit wide FGRs 7 
case fmt of 
S, W: 

FGR[fpr] <— undefined^^ || value 
return 
D, L: 

FGR[fpr] <r- value 
return 
endcase 

elseif fpro = 0 then /* valid specifier, 32-bit wide FGRs 7 
case fmt of 
S, W: 

FGR[fpr-i-1] <— undefined 
FGR[fpr] <r- value 
return 
D, L: 

FGR[fpr-i-1] <— value63...32 
FGR[fpr] <— values^ 0 
return 
endcase 

else /* undefined result for odd 32-bit reg #s 7 

undefined_result 
endif 
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ABS.fmt Absolute Value ABS.fmt 



31 26 25 21 


20 16 


15 


11 


10 


6 


5 0 




C0P1 
0 1000 1 


fmt 


0 

0 0 0 00 


fs 


fd 


ABS 
0 0 0 10 1 




6 


5 


5 


5 




5 




6 



Format: 

ABS.fmt fd, fs 

Description: 

The contents of the FPU register specified by/s are interpreted in the 
specified format and the arithmetic absolute value is taken. The result is 
placed in the floating-point register specified hy fd. 

The absolute value operation is arithmetic; a NaN operand signals invalid 
operation. 

This instruction is valid only for single- and double-precision floating- 
point formats. The operation is not defined if bit 0 of any register 
specification is set and the FR bit in the Status register equals zero, since 
the register numbers specify an even-odd pair of adjacent coprocessor 
general registers. When the FR bit in the Status register equals one, both 
even and odd register numbers are valid. 

Operation: 



T: StoreFPR(fd, fmt, AbsoluteValue(ValueFPR(fs, fmt))) 



Exceptions: 

Coprocessor unusable exception 
Coprocessor exception trap 

Coprocessor Exceptions: 

Unimplemented operation exception 
Invalid operation exception 
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ADD.fmt Floating-Point Add ADD.fmt 



31 26 25 


21 


20 


16 


15 


11 


10 


6 


5 0 




C0P1 
0 1000 1 


fmt 


ft 


fs 


fd 


ADD 
000000 




6 




5 


5 




5 




5 




6 



Format: 

ADD.fmt fd, fs, ft 

Description: 

The contents of the FPU registers specified by/s and/i are interpreted in 
the specified format and arithmetically added. The result is rounded as if 
calculated to infinite precision and then rounded to the specified format 
(fmt), according to the current roimding mode. The result is placed in the 
floating-point register (FPR) specified hy fd. 

This instruction is valid only for single- and double-precision floating- 
point formats. The operation is not defined if bit 0 of any register 
specification is set and the FR bit in the Status register equals zero, since 
the register numbers specify an even-odd pair of adjacent coprocessor 
general registers. When the FR bit in the Status register equals one, both 
even and odd register numbers are vaUd. 

Operation: 



T: StoreFPR (fd, fmt, ValueFPR(fs, fmt) + ValueFPR(ft, fmt)) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Unimplemented operation exception 
Invalid operation exception 
Inexact exception 
Overflow exception 
Underflow exception 



B-U 
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Branch On FPA False 
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Format: 

BCIF offset 

Description: 

A branch target address is computed from the strai of the address of the 
instruction in the delay slot and the 16-bit ojfset, shifted left two bits and 
sign-extended. If the result of the last floating-point compare is false 
(zero), the program branches to the target address, with a delay of one 
instruction. 

There must be at least one instruction between C.cond.fmt and BCIF. 
Operation: 



32 



T-1: 
T: 

T+1: 



condition <- not C0C[1] 



target<- (offsetis)^"^ || 
if condition then 

PC <- PC + target 
endif 



offset II 0'^ 



64 T-1 : condition <- not C0C[1 ] 

T: target ^ (offset-, 5)"^^ 1 1 offset 1 1 0^ 
T+1 : if condition then 

PC <- PC + target 
endif 



Exceptions: 

Coprocessor unusable exception 
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Format: 

BCIFL offset 

Description: 

A branch target address is computed from the simi of the address of the 
instruction in the delay slot and the 16-bit ojfset, shifted left two bits and 
sign-extended. If the result of the last floating-point compare is false 
(zero), the program branches to the target address, with a delay of one 
instruction. If the conditional branch is not taken, the instruction in the 
branch delay slot is nullified. 

There must be at least one instruction between C.cond.fmt and BCIFL. 



Operation: 



32 


T-1: 


condition <- not C0C[1 ] 




T: 


target ^ (offsetis)""^ || offset || 0^ 




T+1: 


if condition then 






PC <- PC + target 






else 






N u 1 1 if y Cu rrent 1 nstruction 






endif 


64 


T-1: 


condition not C0C[1] 




T: 


target (offsetig)'*^ || offset || 0^ 




T+1: 


if condition then 






PC <- PC + target 






else 






N u II if y Cu rrent 1 nstruction 






endif 



Exceptions: 

Coprocessor unusable exception 
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Branch On FPU True 
(Coprocessor 1) 
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Format: 

BCIT offset 

Description: 

A branch target address is computed from the sirai of the address of the 
instruction in the delay slot and the 16-bit ojfset, shifted left two bits and 
sign-extended. If the result of the last floating-point compare is true (one), 
the program branches to the target address, with a delay of one 
instruction. 

There must be at least one instruction between C.cond.fmt and BCIT. 
Operation: 



32 


T-1: 


condition C0C[1] 




T: 


target <- (offsetig)^'* || offset || 0^ 




T+1: 


if condition tinen 






PC <- PC + target 






endif 


64 


T-1: 


condition.^ C0C[1] 




T: 


target ^ (offsetig)'^^ || offset || 0^ 




T+1: 


if condition then 






PC <- PC + target 






endif 



Exceptions: 

Coprocessor unusable exception 



MIPS R4000 Microprocessor User's Manual 



B-17 



Appendix B 



Q^^TI Branch On FPU True Likely D^^HTI 
DO I I I- (Coprocessorl) DO I I I- 
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Format: 

BCITL offset 

Description: 

A branch target address is computed from the sirai of the address of the 
instruction in the delay slot and the 16-bit ojfset, shifted left two bits and 
sign-extended. If the result of the last floating-point compare is true (one), 
the program branches to the target address, with a delay of one 
instruction. If the conditional branch is not taken, the instruction in the 
branch delay slot is nullified. 

There must be at least one instruction between C.cond.fmt and BCITL. 



Operation: 



32 


T-1: 


condition <r- C0C[1] 




T: 


target ^ (offsetis)^"^ || offset || 0^ 




T+1: 


if condition tinen 






PC <- PC + target 






else 






NullifyCurrentlnstruction 






endif 


64 


T-1: 


condition <-C0C[1] 




T: 


target ^ (off seti 5)"^^ || offset || 0^ 




T+1: 


if condition tlien 






PC <- PC + target 






else 






NullifyCurrentlnstruction 






endif 



Exceptions: 

Coprocessor unusable exception 
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Format: 

C.cond.fmt fs, ft 

Description: 

The contents of the floating-point registers specified by/s and ft are 
interpreted in the specified format, fmt, and arithmetically compared. 

A result is determined based on the comparison and the conditions 
specified in the cond field. If one of the values is a Not a Number (NaN), 
and the high-order bit of the cond field is set, an invalid operation 
exception is taken. After a one-instruction delay, the condition is available 
for testing with branch on floating-point coprocessor condition 
instructions. There must be at least one instruction between the compare 
and the branch. 

Comparisons are exact and can neither overflow nor underflow. Four 
mutually-exclusive relations are possible results: less than, equal, greater 
than, and imordered. The last case arises when one or both of the 
operands are NaN; every NaN compares unordered with everything, 
including itself. 

Comparisons ignore the sign of zero, so +0 = -0. 

This instruction is valid only for single- and double-precision floating- 
point formats. The operation is not defined if bit 0 of any register 
specification is set and the FR bit in the Status register equals zero, since 
the register numbers specify an even-odd pair of adjacent coprocessor 
general registers. When the FR bit in the Status register equals one, both 
even and odd register numbers are valid. 



*See "FPU Instruction Opcode Bit Encoding" at the end of Appendix B. 
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C.cond.fmt ^'°compare"* Ccond-fiTit 

(continued) 



Operation: 



T: 


if NaN(ValueFPR(fs, fmt)) or NaN(ValueFPR(ft, fmt)) then 






less <- false 






equal <- false 






unordered true 






if conds then 






signal InvalidOperationException 






endif 




else 


less <- ValueFPR(fs, fmt) < ValueFPR(ft, fmt) 
equal <- ValueFPR(fs, fmt) = ValueFPR(ft, fmt) 
unordered <- false 




endif 






condition <- (cond2 and less) or (condi and equal) or 






(condo and unordered) 




FCR[31]23 ^condition 




C0C[1] <- condition 



Exceptions: 

Coprocessor unusable 
Floating-Point exception 



Coprocessor Exceptions: 

Unimplemented operation exception 
Invalid operation exception 
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Floating-Point 
Ceiling to Long 
Fixed-Point Format 



CEIL.Lfmt 
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Format: 

CEIL.L.fmt fd, fs 

Description: 

The contents of the floating-point register specified by/s are interpreted in 
the specified source format, /mf, and arithmetically converted to the long 
fixed-point format. The resiilt is placed in the floating-point register 
specified by/d. 

Regardless of the setting of the current rounding mode, the conversion is 
rounded as if the current rounding mode is round to +«> (2). 

This instruction is valid only for conversion from single- or double- 
precision floating-point formats. When the FR bit in the Status register 
equals one, both even and odd register numbers are valid. 

When the source operand is an Infinity, NaN, or the correctly rounded 
integer result is outside of -2^'^ to 2^'^- 1, the Invalid operation exception is 
raised. If the Invalid operation is not enabled then no exception is taken 
and 2^-1 is returned. 
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CEIL.L.fmt CeMing^^^^^ CEILLflTlt 

Fixed-Point Format 
(continued) 



Operation: 



T: StoreFPR(fd, L, ConvertFmt(ValueFPR(fs, fmt), fmt, L)) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
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Fixed-Point Format 
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Format: 

CEIL.W.fmt fd, fs 

Description: 

The contents of the floating-point register specified by/s are interpreted in 
the specified source format, /mt, and arithmetically converted to the single 
fixed-point format. The resiilt is placed in the floating-point register 
specified by/d. 

Regardless of the setting of the current rounding mode, the conversion is 
rounded as if the current rounding mode is round to +oo (2). 

This instruction is valid only for conversion from a single- or double- 
precision floating-point formats. The operation is not defined if bit 0 of 
any register specification is set and the FR bit in the Status register equals 
zero, since the register numbers specify an even-odd pair of adjacent 
coprocessor general registers. When the FR bit in the Status register equals 
one, both even and odd register numbers are valid. 

When the source operand is an Infinity or NaN, or the correctly rotmded 
integer resixlt is outside of -2^^ to 2^^- 1, the Invalid operation exception is 
raised. If the Invalid operation is not enabled then no exception is taken 
and 2^^-l is returned. 
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CEIL.W.fmt 



Floating-Point 
Ceiling to Single 
Fixed-Point Format 
(continued) 



CEIL.W.fmt 



Operation: 



T: StoreFPR(fd, W, ConvertFmt(ValueFPR(fs, fmt), fmt, W)) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
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Format: 

CFCl rt, fs 

Description: 

The contents of the FPU control register /s are loaded into general register 
rt. 

This operation is only defined when/s equals 0 or 31. 

The contents of general register rt are undefined for the instruction 
iiranediately following CFCl. 

Operation: 



32 


T: 


temp ^ FCR[fs] 




T+1: 


GPR[rt] ^temp 


64 


T: 


temp ^ FCR[fs] 




T+1: 


GPR[rt] ^ (temp3i)32 || temp 



Exceptions: 

Coprocessor unusable exception 
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CTC1 



Move Control Word To FPU 
(Coprocessor 1) 



CTC1 
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Format: 

CTCl rt, fs 
Description: 

The contents of general register rt are loaded into FPU control register /s. 
This operation is only defined when/s equals 0 or 31. 

Writing to Control Register 31, the floating-point Control/Status register, 
causes an interrupt or exception if any cause bit and its corresponding 
enable bit are both set. The register will be written before the exception 
occurs. The contents of floating-point control register /s are undefined for 
the instruction immediately following CTCl. 

Operation: 



32 


T: 


temp^ GPR[rt] 




T+1: 


FCR[fs] ^ temp 






C0C[1] ^ FCR[31]23 


64 


T: 


temp^ GPR[rt]3i...o 




T+1: 


FCR[fs] <r- temp 






COC[1]<-FCR[31]23 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Unimplemented operation exception 
Invalid operation exception 
Division by zero exception 
Inexact exception 
Overflow exception 
Underflow exception 
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Floating-Point Format 
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Format: 

CVT.D.fmt fd, fs 
Description: 

The contents of the floating-point register specified by/s is interpreted in 
the specified source format, /wif, and arithmetically converted to the 
double binary floating-point format. The result is placed in the floating- 
point register specified hy fd. 

This instruction is valid only for conversions from single floating-point 
format, 32-bit or 64-bit fixed-point format. 

If the single floating-point or single fixed-point format is specified, the 
operation is exact. The operation is not defined if bit 0 of any register 
specification is set and the FR bit in the Status register equals zero, since 
the register numbers specify an even-odd pair of adjacent coprocessor 
general registers. When the FR bit in the Status register equals one, both 
even and odd register numbers are valid. 

Operation: 

T: StoreFPR (fd, D, ConvertFmt(ValueFPR(fs, fmt), fmt, D)) 

Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 
Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
Underflow exception 
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CVT.L.fmt convert^toLon^ CVT.L.fmt 

Fixed-Point Format 
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Format: 

CVT.L.fmt fd, fs 

Description: 

The contents of the floating-point register specified by/s are interpreted in 
the specified source format,^t, and arithmetically converted to the long 
fixed-point format. The result is placed in the floating-point register 
specified by/d. This instruction is valid only for conversions from single- 
or double-precision floating-point formats. The operation is not defined if 
bit 0 of any register specification is set and the FR bit in the Status register 
equals zero. 

When the source operand is an Infinity, NaN, or the correctly rounded 
integer result is outside of -2^^ to 2^^-l, the Invalid operation exception is 
raised. If the Invalid operation is not enabled then no exception is taken 
and 2^^-l is returned. 

Operation: 

T: StoreFPR (fd, L, ConvertFmt(ValueFPR(fs, fmt), fmt, L)) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
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U V I .O.Tmi Convert to Single 

Floating-Point Format 
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Format: 

CVT.S.fmt fd, fs 
Description: 

The contents of the floating-point register specified by/s are interpreted in 
the specified source format, fmt, and arithmetically converted to the single 
binary floating-point format. The result is placed in the floating-point 
register specified hyfd. Roimding occurs according to the currently 
specified roimding mode. 

This instruction is valid only for conversions from double floating-point 
format, or from 32-bit or 64-bit fixed-point format. The operation is not 
defined if bit 0 of any register specification is set and the FR bit in the Status 
register equals zero, since the register numbers specify an even-odd pair 
of adjacent coprocessor general registers. When the FR bit in the Status 
register equals one, both even and odd register nimibers are valid. 

Operation: 



T: StoreFPR(fd, S, ConvertFmt(ValueFPR(fs, fmt), fmt, S)) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
Underflow exception 
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CVT.W.fmt ''°convirtr' CVT.W.fmt 

Fixed-Point Format 
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Format: 

CVT.W.fmt fd, fs 
Description: 

The contents of the floating-point register specified by/s are interpreted in 
the specified source iormat, fmt, and arithmetically converted to the single 
fixed-point format. The resiilt is placed in the floating-point register 
specified by/rf. This instruction is valid only for conversion from a single- 
or double-precision floating-point formats. The operation is not defined if 
bit 0 of any register specification is set and the FR bit in the Status register 
equals zero, since the register numbers specify an even-odd pair of 
adjacent coprocessor general registers. When the FR bit in the Status 
register equals one, both even and odd register numbers are valid. 

When the source operand is an Infinity or NaN, or the correctly roimded 
integer result is outside of -2^^ to 2^^-l, an Invalid operation exception is 
raised. If Invalid operation is not enabled, then no exception is taken and 
2^^ -1 is returned. 

Operation: 

T: StoreFPR(fd, W, ConvertFmt(ValueFPR(fs, fmt), fmt, W)) 

Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
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Format: 

DIV.fmt fd, fs, ft 
Description: 

The contents of the floating-point registers specified by/s and ft are 
interpreted in the specified/ormat and the value in the fs field is divided by 
the value in the^ field. The result is rounded as if calculated to infinite 
precision and then rounded to the specified format, according to the 
current rounding mode. The result is placed in the floating-point register 
specified by fd. 

This instruction is valid for only single or double precision floating-point 
formats. 

The operation is not defined if bit 0 of any register specification is set and 
the FR bit in the Status register equals zero, since the register numbers 
specify an even-odd pair of adjacent coprocessor general registers. When 
the FR bit in the Status register equals one, both even and odd register 
numbers are valid. 

Operation: 



T: StoreFPR (fd, fmt, ValueFPR(fs, fmt)/ValueFPR(ft, fmt)) 

Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 
Coprocessor Exceptions: 

Unimplemented operation exception Invalid operation exception 

Division-by-zero exception Inexact exception 

Overflow exception Underflow exception 
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UlVirO I Floating-Point Coprocessor ■-'■virv^ I 
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Format: 

DMFCl rt, fs 

Description: 

The contents of register /s from the floating-point coprocessor is stored 
into processor register rt. 

The contents of general register rt are undefined for the instruction 
immediately following DMFCl. 

The FR bit in the Status register specifies whether all 32 registers of the 
R4000 are addressable. When FR equals zero, this instruction is not 
defined when the least significant bit of/s is non-zero. When FR is set, 
may specify either odd or even registers. 

Operation: 



64 T: if SR26 = 1 then /* 64-bit wide FGRs 7 

data ^ FGR[fs] 
elseif fso = 0 then /* valid specifier, 32-bit wide FGRs 7 

data<-FGR[fs+1] || FGR[fs] 
else /* undefined for odd 32-bit reg #s 7 

data <— undefined®'* 
endif 

T+1: GPR[rt]^data 



Exceptions: 

Coprocessor unusable exception 
Coprocessor Exceptions: 

Unimplemented operation exception 
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Format: 

DMTCl rt, fs 

Description: 



The contents of general register rt are loaded into coprocessor register/s of 
the CPl. 

The contents of floating-point register/s are iindefined for the instruction 
immediately following DMTCl. 

The FR bit in the Status register specifies whether all 32 registers of the 
R4000 are addressable. When FR equals zero, this instruction is not 
defined when the least significant bit of /s is non-zero. When FR equals 
one,/s may specify either odd or even registers. 



Operation: 



64 T: data <- GPR[rt] 

T+1 : if SR26 = 1 then /* 64-bit wide FGRs 7 
FGR[fs] <- data 
elseif fso = 0 then /*valid specifier, 32-bit wide valid FGRs 7 
FGR[fs+1] <r- data63...32 

FGR[fs] <-data3-|...o 
else /* undefined result for odd 32-bit reg #s 7 

undefined_result 
endif 



Exceptions: 

Coprocessor unusable exception 
Coprocessor Exceptions: 

Unimplemented operation exception 
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FLOOR.Lfmt 



Floating-Point 
Floor to Long 
Fixed-Point Format 
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Format: 

FLOOR.L.fmt fd, fs 

Description: 

The contents of the floating-point register specified by/s are interpreted in 
the specified source format, /mf, and arithmetically converted to the long 
fixed-point format. The resiilt is placed in the floating-point register 
specified by/d. 

Regardless of the setting of the current rounding mode, the conversion is 
rounded as if the current rounding mode is round to -«> (3). 

This instruction is valid only for conversion from single- or double- 
precision floating-point formats. 

When the source operand is an Infinity, NaN, or the correctly rounded 
integer result is outside of -2^"^ to 2^'^- 1, the Invalid operation exception is 
raised. If the Invalid operation is not enabled then no exception is taken 
and 2^^-l is returned. 
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FLOOR.L.fmt ^^Z^ FLOOR.L.fmt 

Fixed-Point Format 
(continued) 



operation: 



T: StoreFPR(fd, L, ConvertFmt(ValueFPR(fs, fmt), fmt, L)) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
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FLOOR.W.fmt ^r;%?„^",'e FLOOR.W.fmt 



Fixed-Point Format 
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Format: 

FLOOR. W.fmt fd, fs 

Description: 

The contents of the floating-point register specified by/s are interpreted in 
the specified source format, fmt, and arithmetically converted to the single 
fixed-point format. The result is placed in the floating-point register 

specified by fd. 

Regardless of the setting of the current rounding mode, the conversion is 
rounded as if the current roimding mode is round to -«> (RM = 3). 

This instruction is valid only for conversion from a single- or double- 
precision floating-point formats. The operation is not defined if bit 0 of 
any register specification is set and the FR bit in the Status register equals 
zero, since the register numbers specify an even-odd pair of adjacent 
coprocessor general registers. When the FR bit in the Status register equals 
one, both even and odd register numbers are valid. 

When the source operand is an Infinity or NaN, or the correctly rounded 
integer result is outside of -2^^ to 2^^-l, an Invalid operation exception is 
raised. If Invalid operation is not enabled, then no exception is taken and 
2^^-l is returned. 
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FLOOR.W.fmt prtosin"^; FLOOR.W.fmt 

Fixed-Point Format 
(continued) 



Operation: 



T: StoreFPR(fd, W, ConvertFmt(ValueFPR(fs, fmt), fmt, W)) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
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I r^^^ Load Doubleword to FPU ■ 1-%^^ 

LUU1 (Coprocessorl) LUU1 
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Format: 

LDCl ft, offset(base) 

Description: 

The 16-bit ojfset is sign-extended and added to the contents of general 
register base to form an unsigned effective address. 

In 32-bit mode, the contents of the doubleword at the memory location 
specified by the effective address is loaded into registers and ft+1 of the 
floating-point coprocessor. This instruction is not valid, and is undefined, 
when the least significant bit of ft is non-zero. 

In 64-bit mode, the contents of the doubleword at the memory location 
specified by the effective address are loaded into the 64-bit register/lf of the 
floating point coprocessor. 

The FR bit of the Status register (SR26) specifies whether all 32 registers of 
the R4000 are addressable. If FR equals zero, this instruction is not defined 
when the least significant bit of ft is non-zero. If FR equals one, ft may 
specify either odd or even registers. 

If any of the three least-significant bits of the effective address are non- 
zero, an address error exception takes place. 
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LDC1 


Load Doubleword to FPU i r^^H 
fCoDrocessor 1^ L^^^ 1 

(continued) 


Operation: 




32 T: 
64 T: 


vAddr <r- ((offsetis)^^ || offsetis . o) + GPR[base] 
vAddr <- ((offsetis)"^^ || offseti5...o) + GPR[base] 


32, 64 


(pAddr, uncached) AddressTranslation (vAddr, DATA) 

rJoto z 1 AaHM omAr\//i inrarhoH nnilDI CVA/HDn nAHHr \/AHHr PlATAX 

aaia <— LOaaiviemory^uncacnea, uuuDLtvvurvu, pAoar, vAaar, ua i a^ 
if SR26 = 1 then /* 64-bit wide FGRs 7 

rvjiri[iLj ^ uciLci 

elseif fto = 0 then /* valid specifier, 32-bit wide FGRs 7 
FGR[ft+1]<-data63...32 

FGR[ft] ^ datasi .0 
else /* undefined result if odd 7 

undefined_result 
endif 


Exceptions: 




Coprocessor unusable 
TLB refill exception 
TLB invalid exception 
Bus error exception 



Address error exception 
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LWC1 



Load Word to FPU 
(Coprocessor 1) 



LWC1 
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Format: 

LWCl ft, offset(base) 

Description: 

The 16-bit ojfset is sign-extended and added to the contents of general 
register base to form an unsigned effective address. The contents of the 
word at the memory location specified by the effective address is loaded 
into register of the floating-point coprocessor. 

The FR bit of the Status register specifies whether all 64-bit Floating-Point 
registers are addressable. If FR equals zero, LWCl loads either the high or 
low half of the 16 even Floating-Point registers. If FR equals one, LWCl 
loads the low 32-bits of both even and odd Floating-Point registers. 

If either of the two least-significant bits of the effective address is non-zero, 
an address error exception occurs. 
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I \A/r^1 Load Word to FPU I XMf^'i 

(Coprocessor 1 ) I- V V I 

(continued) 



Operation: 



32 T: 


vAddr<- ((offsetig;^^ || offsetig q) + GPR[base] 


64 T: 


vAddr <- ((offsetig)'^^ || offsetig + GPR[base] 


32, 64 


(pAddr, uncached) <r- AddressTranslation (vAddr, DATA) 




pAddr<- pAddrpsizE-i...3 II (pAddr2...o xor (ReverseEndian || 0^)) 




mem <- LoadMemory(uncached, WORD, pAddr, vAddr, DATA) 




byte ^ vAddr2...o xor (BigEndianCPU || 0^) 




/* "mem" is aligned 64-bits from memory. Pick out correct bytes. */ 




if SR26 = 1 then /* 64-bit wide FGRs 7 




FGR[ft] <- undefined^^ || mem3i+8*byte...8*byte 




else /* 32-bit wide FGRs 7 




FGR[ft] <r- mem3i+8*byte...8*byte 




endif 



Exceptions: 

Coprocessor unusable 
TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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MFC1 



Move From FPU 
(Coprocessor 1) 
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Format: 

MFCl rt, fs 

Description: 

The contents of register /s from the floating-point coprocessor are stored 
into processor register rt. 

The contents of register rt are undefined for the instruction immediately 
following MFCl. 

The FR bit of the Status register specifies whether all 32 registers of the 
R4000 are addressable. If FR equals zero, MFCl stores either the high or 
low half of the 16 even Floating-Point registers. If FR equals one, MFCl 
stores the low 32-bits of both even and odd Floating-Point registers. 



Operation: 



32 


T: 


data <- FGR[fs]3i ..o 




T+1: 


GPR[rt] <r- data 


64 


T: 


data <r- FGR[fs]3i ,.o 




T+1: 


GPR[rt] ^ (data3i)32 || data 



Exceptions: 

Coprocessor unusable exception 
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MOV.f mt Floating-Point Move MOV.f ITlt 
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Format: 

MOV.fmt fd, fs 

Description: 

The contents of the FPU register specified by/s are interpreted in the 
specified /ormflf and are copied into the FPU register specified hy fd. 

The move operation is non-arithmetic; no IEEE 754 exceptions occur as a 
result of the instruction. 

This instruction is valid only for single- or double-precision floating-point 
formats. 

The operation is not defined if bit 0 of any register specification is set and 
the FR bit in the Status register equals zero, since the register numbers 
specify an even-odd pair of adjacent coprocessor general registers. When 
the FR bit in the Status register equals one, both even and odd register 
numbers are valid. 

Operation: 

T: StoreFPR(fd, fmt, ValueFPR(fs, fmt)) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Unimplemented operation exception 
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MTC1 



Move To FPU 
(Coprocessor 1) 
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Format: 

MTCl rt, fs 

Description: 

The contents of register rt are loaded into the FPU general register at 
location js. 

The contents of floating-point register /s is imdefined for the instruction 
immediately following MTCl. 

The FR bit of the Status register specifies whether all 32 registers of the 
R4000 are addressable. If FR equals zero, MTCl loads either the high or 
low half of the 16 even Floating-Point registers. If FR equals one, MTCl 
loads the low 32-bits of both even and odd Floating-Point registers. 



Operation: 



32,64 T: 


data ^ GPR[rt]3i 0 


T+1: 


if SR26 = 1 then /* 64-bit wide FGRs 7 




FGR[fs] <r- undefined^^ || data 




else /* 32-bit wide FGRs 7 




FGR[fs] <- data 




endif 



Exceptions: 

Coprocessor unusable exception 
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MUL.fmt Floating-Point Multiply MUL.flTIt 
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Format: 

MUL.fmt fd, fs, ft 
Description: 

The contents of the floating-point registers specified by/s and ft are 
interpreted in the specified /omat and arithmetically multiplied. The 
resixlt is roimded as if calciilated to infinite precision and then roimded to 

the specified /ormflt, according to the current rounding mode. The result 
is placed in the floating-point register specified by/d. 

This instruction is valid only for single- or double-precision floating-point 
formats. 

The operation is not defined if bit 0 of any register specification is set and 
the FR bit in the Status register equals zero, since the register numbers 

specify an even-odd pair of adjacent coprocessor general registers. When 
the FR bit in the Status register equals one, both even and odd register 
numbers are valid. 

Operation: 



T: StoreFPR (fd, fmt, ValueFPR(fs, fmt) * ValueFPR(ft, fmt)) 

Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Unimplemented operation exception 
Invalid operation exception 
Inexact exception 
Overflow exception 
Underflow exception 
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NEG.fmt Floating-Point Negate NEG.flTlt 



31 26 


25 


21 


20 16 


15 


11 


10 


6 


5 0 




C0P1 
0 10 0 0 1 


fmt 


0 

000 00 


fs 


fd 


NEG 
000 111 




6 




5 


5 


5 




5 




6 



Format: 

NEG.fmt fd, fs 

Description: 

The contents of the FPU register specified by/s are interpreted in the 
specified /ormflf and the arithmetic negation is taken (polarity of the sign- 
bit is changed). The result is placed in the FPU register specified hy fd. 

The negate operation is arithmetic; an NaN operand signals invalid 
operation. 

This instruction is valid only for single- or double-precision floating-point 
formats. The operation is not defined if bit 0 of any register specification 
is set and the FR bit in the Status register equals zero, since the register 
numbers specify an even-odd pair of adjacent coprocessor general 
registers. When the FR bit in the Status register equals one, both even and 
odd register numbers are valid. 

Operation: 



T: StoreFPR(fd, fmt, Negate(ValueFPR(fs, fmt))) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Unimplemented operation exception 
Invalid operation exception 
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ROUND.L.fmt ROUND.L.fmt 

Fixed-Point Format 



31 26 


25 


21 


20 16 


15 


11 


10 


6 


5 0 




C0P1 
0 1000 1 


fmt 


0 

0 00 0 0 


fs 


fd 


ROUND. L 
0 0 10 0 0 




6 




5 


5 


5 




5 




6 



Format: 

ROUND.L.fmt fd, fs 

Description: 

The contents of the floating-point register specified by/s are interpreted in 
the specified source format, /mf, and arithmetically converted to the long 
fixed-point format. The resiilt is placed in the floating-point register 
specified by/d. 

Regardless of the setting of the current rounding mode, the conversion is 
rounded as if the current rounding mode is round to nearest/ even (0). 

This instruction is valid only for conversion from single- or double- 
precision floating-point formats. 

When the source operand is an Infinity, NaN, or the correctly rounded 
integer result is outside of -2^"^ to 2^'^- 1, the Invalid operation exception is 
raised. If the Invalid operation is not enabled then no exception is taken 
and 2^^ -1 is returned. 
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ROUND.L.fmt .^^Z^ ROUND.L.fmt 

Fixed-Point Format 
(continued) 



operation: 



T: StoreFPR(fd, L, ConvertFmt(ValueFPR(fs, fmt), fmt, L)) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
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ROUND.W.fmt Floating-Point ROUND.W.flTlt 

Round to Single 
Fixed-Point Format 
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Format: 

ROUND.W.fmt fd, fs 

Description: 

The contents of the floating-point register specified by/s are interpreted in 
the specified source format, fmt, and arithmetically converted to the single 
fixed-point format. The result is placed in the floating-point register 
specified hy fd. 

Regardless of the setting of the current rounding mode, the conversion is 
roimded as if the current roimding mode is round to the nearest/ even 
(RM = 0). 

This instruction is valid only for conversion from a single- or double- 
precision floating-point formats. The operation is not defined if bit 0 of 
any register specification is set and the FR bit in the Status register equals 
zero, since the register nvmribers specify an even-odd pair of adjacent 
coprocessor general registers. When the FR bit in the Status register equals 
one, both even and odd register numbers are valid. 

When the source operand is an Infinity or NaN, or the correctly roimded 
integer result is outside of -2^^ to 2^^ -1, an Invalid operation exception is 
raised. If Invalid operation is not enabled, then no exception is taken and 
2^^ -1 is returned. 
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ROUND.W.fmt Floatingpoint RQUND.W.fmt 

Round to Single 
Fixed-Point Format 
(continued) 



operation: 



T: StoreFPR(fd, W, ConvertFmt(ValueFPR(fs, fmt), fmt, W)) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
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Qr^r%^ Store Doubleword from FPU Qf^Oi 
OUw I (Coprocessor 1) OUO I 



31 26 


25 21 


20 16 


15 




0 




SDC1 
11110 1 


base 


ft 


offset 




6 


5 


5 




16 





Format: 

SDCl ft, offset(base) 

Description: 

The 16-bit ojfset is sign-extended and added to the contents of general 
register base to form an unsigned effective address. 

In 32-bit mode, the contents of registers^ and ft+1 from the floating-point 
coprocessor are stored at the memory location specified by the effective 
address. This instruction is not valid, and is imdefined, when the least 
significant bit of ft is non-zero. 

In 64-bit mode, the 64-bit register /it is stored to the contents of the 
doubleword at the memory location specified by the effective address. 
The FR bit of the Status register (SR25) specifies whether all 32 registers of 
the R4000 are addressable. When FR equals zero, this instruction is not 
defined if the least significant bit of ft is non-zero. If FR equals one,^ may 
specify either odd or even registers. 

If any of the three least-significant bits of the effective address are non- 
zero, an address error exception takes place. 
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^f^r^A Store Doubleword from FPU Of^^-I 
OUUl (Coprocessorl) oUUl 



(continued) 



operation: 



32 T: vAddr <- (offsetig)^^ || offsetig. o) + GPR[base] 
64 T: vAddr <- (offsetig)'*^ || offsetig. q) + GPR[base] 

32,64 (pAddr, uncached) ^ AddressTranslation (vAddr, DATA) 

if SR26 = 1 /* 64-bit wide FGRs 7 

data <- FGR[ft] 
elseif fto = 0 then /* valid specifier, 32-bit wide FGRs 7 

data^ FGR[ft-Hl] || FGR[ft] 
else /* undefined for odd 32-bit reg #s 7 

data <— undefined^^ 
endif 

StoreMemory(uncached, DOUBLEWORD, data, pAddr, vAddr, DATA) 

Exceptions: 

Coprocessor unusable 
TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 
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SQRT.fmt 



Floating-Point 
Square Root 
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Format: 

SQRT.fmt fd, fs 
Description: 

The contents of the floating-point register specified by/s are interpreted in 
the specified /ormai and the positive arithmetic square root is taken. The 
result is roimded as if calculated to infinite precision and then roimded to 
the specified/ormat, according to the current rounding mode. If the value 
of /s corresponds to -0, the result will be -0. The result is placed in the 
floating-point register specified hyfd. 

This instruction is valid only for single- or double-precision floating-point 
formats. 

The operation is not defined if bit 0 of any register specification is set and 
the FR bit in the Status register equals zero, since the register numbers 
specify an even-odd pair of adjacent coprocessor general registers. When 
the FR bit in the Status register equals one, both even and odd register 
numbers are valid. 

Operation: 



T: StoreFPR(fd, fmt, SquareRoot(ValueFPR(fs, fmt))) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Ununplemented operation exception 
Invalid operation exception 
Inexact exception 
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SUB.f mt Floating-Point Subtract SUB.f ITlt 
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Format: 

SUB.fmt fd, fs, ft 



Description: 

The contents of the floating-point registers specified by/s and ft are 
interpreted in the specified/ormat and the value in theft field is subtracted 
from the value in the^ field. The result is roiinded as if calculated to 
infinite precision and then rounded to the specified /ormat, according to 
the current rounding mode. The result is placed in the floating-point 
register specified by/rf. This instruction is valid only for single- or double- 
precision floating-point formats. 

The operation is not defined if bit 0 of any register specification is set and 
the FR bit in the Status register equals zero, since the register numbers 
specify an even-odd pair of adjacent coprocessor general registers. When 
the FR bit in the Status register equals one, both even and odd register 
numbers are valid. 

Operation: 

T: StoreFPR (fd, fmt, ValueFPR(fs, fmt) - ValueFPR(ft, fmt)) 

Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Unimplemented operation exception 
Invalid operation exception 
Inexact exception 
Overflow exception 
Underflow exception 
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store Word from FPU 
(Coprocessor 1) 



SWC1 
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Format: 

SWCl ft, offset(base) 

Description: 

The 16-bit ojfset is sign-extended and added to the contents of general 
register base to form an unsigned effective address. The contents of 
register /f from the floating-point coprocessor are stored at the memory 
location specified by the effective address. 

The FR bit of the Status register specifies whether all 64-bit floating-point 
registers are addressable. 

If FR equals zero, SWCl stores either the high or low half of the 16 even 
floating-point registers. 

If FR equals one, SWCl stores the low 32-bits of both even and odd 
floating-point registers. 

If either of the two least-significant bits of the effective address are non- 
zero, an address error exception occurs. 



MIPS R4000 Microprocessor User's Manual 



B-55 



Appendix B 



SWC1 



store Word from FPU 
(Coprocessor 1) 

(continued) 



SWC1 



Operation: 



32 
64 



T: 
T: 



vAddr <- ((offset-ig)''^ || offsetis . o) + GPR[base] 
vAddr <- ((offsetis)"^^ || offseti5...o) + GPR[base] 



32, 64 (pAddr, uncached) <r- AddressTranslation (vAddr, DATA) 

pAddr<- pAddrpsizE-i...3 I I (pAddr2, o xor (ReverseEndian || 
byte <- vAddr2...o xor (BigEndianCPU || 0^) 
/* the bytes of the word are put in the correct byte lanes in 

* "data" for a 64-bit path to memory */ 
if SR26 = 1 then /* 64-bit wide FGRs 7 
data^FGR[ft]63-8*byte...ollO^*''y'^ 
else /* 32-bit wide FGRs 7 

data ^ 032-8*byte II pGR[ft] y o8*byte 

endif 

StoreMemory (uncached, WORD, data, pAddr, vAddr, DATA) 



02)) 



Exceptions: 

Coprocessor unusable 
TLB refill exception 
TLB invalid exception 
TLB modification exception 

Bus error exception 
Address error exception 
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Fixed-Point Format 
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Format: 

TRUNC.L.fmt fd, fs 

Description: 

The contents of the floating-point register specified by/s are interpreted in 
the specified source format, /mf, and arithmetically converted to the long 
fixed-point format. The resiilt is placed in the floating-point register 
specified by/d. 

Regardless of the setting of the current rounding mode, the conversion is 
rounded as if the current rounding mode is round toward zero (1). 

This instruction is valid only for conversion from single- or double- 
precision floating-point formats. 

When the source operand is an Infinity, NaN, or the correctly rounded 
integer result is outside of -2^'^ to 2^^-l, the Invalid operation exception is 
raised. If the Invalid operation is not enabled then no exception is taken 
and 2^^-l is returned. 
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TRUNC.L.fmt .r^reCong TRUNC.L.fmt 

Fixed-Point Format 
(continued) 



operation: 



T: StoreFPR(fd, L, ConvertFmt(ValueFPR(fs, fmt), fmt, L)) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
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Fixed-Point Format 
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Format: 

TRUNC.W.fmt fd, fs 

Description: 

The contents of the FPU register specified by/s are interpreted in the 
specified source format /mf and arithmetically converted to the single 
fixed-point format. The result is placed in the FPU register specified hy fd. 

Regardless of the setting of the current rounding mode, the conversion is 
rounded as if the current rounding mode is round toward zero (RM = 1). 

This instruction is valid only for conversion from a single- or double- 
precision floating-point formats. The operation is not defined if bit 0 of 
any register specification is set and the FR bit in the Status register equals 
zero, since the register numbers specify an even-odd pair of adjacent 
coprocessor general registers. When the FR bit in the Status register equals 
one, both even and odd register numbers are valid. 

When the source operand is an Infinity or NaN, or the correctly roimded 
integer result is outside of -2^^ to 2^^~^, an Invalid operation exception is 
raised. If Invalid operation is not enabled, then no exception is taken and 
-2^^ is returned. 



MIPS R4000 Microprocessor User's Manual 



B-59 



Appendix B 



TRUNC.W.fmt Floating-Point TRUNC.W.fmt 

Truncate to Single 
Fixed-Point Format 
(continued) 



Operation: 

T: StoreFPR(fd, W, ConvertFmt(ValueFPR(fs, fmt), fmt, W)) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
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FPU Instruction Opcode Bit Encoding 



31. ..29 




28.. .26 
0 


1 


2 


Opcode 

3 4 


5 


6 


7 


0 




















1 




















2 




C0P1 
















3 




















4 




















5 




















6 




LWC1 








LDC1 








7 




SWC1 








SDC1 








25. ..24 


23.. .21 

0 


1 


2 


sub 

3 4 


5 


6 


7 


0 


MF 


DMFri 


CF 


5 


MT 


DMTri 


CT 


5 




1 


BC 


8 


6 


8 


5 


5 


5 


5 




2 


S 


D 


5 


5 


W 


Lti 


5 


5 




3 


5 


5 


5 


5 


5 


5 


5 


5 




20.. .19 


18...16 

0 


1 


2 


br 

3 4 


5 


6 


7 


0 


BCF 


BCT 


BCFL 


BCTL 


y 


y 


y 


y 




1 


y 


y 


y 


y 


y 


y 


y 


y 




2 


y 


y 


y 


y 


y 


y 


y 


y 




3 


y 


y 


y 


y 


y 


y 


y 


y 

























Figure B-3 Bit Encoding for FPU Instructions 



MIPS R4000 Microprocessor User's Manual 



B-61 



Appendix B 



function 



5...3 


0 


1 


2 


3 


4 


5 


6 


7 


0 


ADD 


SUB 


MUL 


DIV 


SORT 


ABS 


MOV 


NEG 


1 


ROUND.Lri 


TRUNC.Lri 


CEILLT] 


FLOOR.Lri 


ROUND.W 


TRUNC.W 


CEIL.W 


FLOOR.W 


2 


8 


8 


8 


8 


8 


8 


8 


8 


3 


5 


8 


8 


8 


8 


8 


8 


8 


4 


CVT.S 


CVT.D 


8 


8 


CVT.W 


CVT.Lri 


8 


8 


5 


5 


5 


5 


5 


5 


5 


5 


5 


6 


C.F 


C.UN 


C.EQ 


C.UEQ 


COLT 


CULT 


COLE 


CULE 


7 


C.SF 


C.NGLE 


C.SEQ 


C.NGL 


C.LT 


CNGE 


CLE 


CNGT 



Figure B-3 (cont.) Bit Encoding for FPU Instructions 
Key: 

Y Operation codes marked with a gamma cause a reserved 

instruction exception. They are reserved for future versions of the 
architecture. 

6 Operation codes marked with a delta cause unimplemented 
operation exceptions in all current implementations and are 
reserved for future versions of the architecture. 

r\ Operation codes marked with an eta are valid only when MIPS III 
instructions are enabled. Any attempt to execute these without 
MIPS m instructions enabled causes an unimplemented operation 
exception. 
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A block of data elements (whether bytes, halfwords, words, or 
doublewords) can be retrieved from storage in two ways: in sequential 
order, or using a subblock order. This chapter describes these retrieval 
methods, with an emphasis on subblock ordering. 
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C.l Sequential Ordering 



Sequential ordering retrieves the data elements of a block in serial, or 
sequential, order. 

Figure C-1 shows a sequential order in which byte 0 is taken first and byte 
7 is taken last. 



Byte 0 


Byte 1 


Byte 2 


Byte 3 


Byte 4 


Byte 5 


Byte 6 


Byte 7 



ByteO 
taken first 




Byte 3 
taken fourth 



Byte 1 

taken second Byte 2 
taken third 



Byte 5 
taken sixth 



Byte 4 
taken fifth 



Byte 6 
taken seventh 



Byte 7 
taken last 



Figure C-1 Retrieving a Data Block in Sequential Order 

C.2 Subblock Ordering 

Subblock ordering allows the system to define the order in which the data 
elements are retrieved. The smallest data element of a block transfer for 
the R400 is a doubleword, and Figure C-2 shows the retrieval of a block of 
data that consists of 8 doublewords, in which DW2 is taken first. 

hexword (block) 



octalword 



Order of retrieval 







quadword 










2 


3 


0 


1 


6 


7 


4 


5 


DWO 


DW1 


DW2 


DW3 


DW4 


DW5 


DW6 


DW7 



DWO 
taken third 




DW5 



taken second taken eighth 



DW1 

taken fourth dW2 
taken first 



DW4 
taken seventh 




DW6 
taken fifth 



DW7 
taken sixth 



Figure C-2 Retrieving Data in a Subblock Order 
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Using the subblock ordering shown in Figure C-2, the doubleword at the 
target address is retrieved first (DW2), followed by the remaining 
doubleword (DW3) in this quadword. 

Next, the quadword that fills out the octalword are retrieved in the same 
order as the prior quadword (in this case DWO is followed by DW 1). This 
is followed by the remaining octalword (DW8, DW7, DW4, DW5), that fills 
out the hexword. 

It may be easier way to understand subblock ordering by taking a look at 
the method used for generating the address of each doubleword as it is 
retrieved. The subblock ordering logic generates this address by 
executing a bit- wise exclusive-OR (XOR) of the starting block address with 
the output of a binary coimter that increments with each doubleword, 
starting at doubleword zero (OOO2). 

Using this scheme. Tables C-1 through Table C-3 list the subblock ordering 
of doublewords for a 32-word block, based on three different starting- 
block addresses: OOIO2, IOII2, and OIOI2. The subblock ordering is 
generated by an XOR of the subblock address (either OOIO2, IOII2, and 
OIOI2) with the binary count of the doubleword (OOOO2 through IIII2). 
Thus, the eighth doubleword retrieved from a block of data with a starting 
address of OOIO2 is found by taking the XOR of address OOIO2 with the 
binary count of DW8, OIII2. The result is OIOI2, or DW5 (shown in Table 
C-1). 

The remaining tables illustrate this method of subblock ordering, using 
various address permutations. 
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Table C-1 Sequence of Doublewords Transferred Using Subblock Ordering: Address OOIO2 



Cycle 


Starting Block 
Address 


Binary Count 


Double Word 
Retrieved 


1 


0010 


0000 


0010 


2 


0010 


0001 


0011 


3 


0010 


0010 


0000 


4 


0010 


0011 


0001 




5 


0010 


0100 


0110 


6 


0010 


0101 


0111 


7 


0010 


0110 


0100 


8 


0010 


0111 


0101 




9 


0010 


1000 


1010 


10 


0010 


1001 


1011 


11 


0010 


1010 


1000 


12 


0010 


1011 


1001 




13 


0010 


1100 


1110 


14 


0010 


1101 


nil 


15 


0010 


1110 


1100 


16 


0010 


nil 


1101 
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Table C-2 Sequence of Doublewords Transferred Using Subblock Ordering: Address lOUj 



Cycle 


Starting Block 
Address 


Binary Count 


Double Word 
Retrieved 


1 


1011 


0000 


1011 


2 


1011 


0001 


1010 


3 


1011 


0010 


1001 


4 


1011 


0011 


1000 




5 


1011 


0100 


nil 


6 


1011 


0101 


1110 


7 


1011 


0110 


1101 


8 


1011 


0111 


1100 




9 


1011 


1000 


0011 


10 


1011 


1001 


0010 


11 


1011 


1010 


0001 


12 


1011 


1011 


0000 




13 


1011 


1100 


0111 


14 


1011 


1101 


0110 


15 


1011 


1110 


0101 


16 


1011 


nil 


0100 
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Table C-3 Sequence of Doublewords Transferred Using Subblock Ordering: Address OlOlj 



Cycle 


Starting Block 
Address 


Binary Count 


Double Word 
Retrieved 


1 


0101 


0000 


0101 


2 


0101 


0001 


0100 


3 


0101 


0010 


0111 


4 


0101 


0011 


0110 




5 


0101 


0100 


0001 


6 


0101 


0101 


0000 


7 


0101 


0110 


0011 


8 


0101 


0111 


0010 




9 


0101 


1000 


1101 


10 


0101 


1001 


1100 


11 


0101 


1010 


nil 


12 


0101 


1011 


1110 




13 


0101 


1100 


1001 


14 


0101 


1101 


1000 


15 


0101 


1110 


1011 


16 


0101 


nil 


1010 
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The speed of the R4000 output drivers is controlled by a negative feedback 
loop that insures the drive-off times are only as fast as necessary to meet 
the system requirement for single cycle transfers. This guarantees the 
minimum ground bounce from L*(Ai/At) of the switching buffers, 
consistent with the system timing requirements. 

D.l Mode Bits 

Four bits are used to control the pull-up and pull-down delays. These bits 
are initially set to the values in the mode bits InitN(3:0) for pull-up and 
InitP(3:0) for pull-down. If the Ai/At control mechanism is enabled, it is 
recommended to load the mode bits InitP(3:0) and InitN(3:0) to the values 
which provide the slowest slew rate. 

Under normal conditions, the Ai/At control mechanism is enabled to 
compensate the output buffer delay for any changes in the temperature or 
power supply voltage. The EnblDPLL mode bit is set for this mode of 
operation. 
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For situations where the jitter associated with the operation of the Ai/At 
control mechanism cannot be tolerated and where the variation in 
temperature and supply voltage after ColdReset* is expected to be small, 
the Ai/At control mechanism can be instructed to lock during ColdReset* 
and thereafter retain its control values. The EnblDPLLR mode bit is set 
and EnblDPLL is cleared for this mode of operation. 

In addition, if both the EnblDPLL and EnblDPLLR mode bits are cleared, 
the speed of the output buffers are set by the InitP(3:0) and InitN(3:0) 
mode bits. 

D.2 Delay Times 

Currently, delays of 0.5T, 0.75T, and T are supported, corresponding to the 
Drv0_50, Drv0_75, and Drvl_00 mode bits, where T is the MasterClock 
period. For example, in Drv0_75 mode, the entire signal transmission path 
including the clock-to-Q, output buffer drive time, board flight time, input 
buffer delay, and setup time will be traversed in 0.75 * the MasterClock 
period, plus or minus the jitter due to the Ai/At control mechanism. 

All output drivers on the R4000, with the exception of the clock drivers, are 
controlled by the Ai/At control mechanism. The delay due to the output 
buffer drive time component of the SCAddr(17:0), SCOEB, SCWRB, 
SCDCSB, and SCTCSB pins is approximately 66% of the delay of drivers 
of the other pins. 

By measuring the transmission line delay of the trace that connects the 
R4000 IO_Out and IO_In pins, the R4000 determines the worst case 
propagation delay from an R4000 output driver to a receiving device. This 
representative trace must have one and a half times the length and 
approximately the same capacitive loading as the worst case trace on any 
R4000 output. 
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The designer determines the trace characteristics by: 

• measuring the longest path from an R4000 output driver to a 
receiving device, L 

• calculating the maximum capacitive loading on any signal pin, 
C 

• connecting an incident-wave trace of length L with a capacitive 
loading of C between the IO_In and IO_Out pins of the R4000 

• connecting a reflected wave trace of length L/2 to the IO_In 
pin of the R4000. 

An R4000 with appropriate traces connected to the IO_In and IO_Out 
pins is illustrated in Figure D-1. 



CPU Board 




Length = L/2 



Load = C ^ 



The longest trace from an 
R4000 output driver to a 
receiving device 



Y 



"Reflected wave" trace 



Incident Wave" Trace 



L=a+b+c+d 



C = Total Capacitance Loading 
of the worst case trace 

Figure D-1 0_ln/lO_Out Board Trace 
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The Phase Locked Loop circuit requires several passive components for 
proper operation, which are connected to PLLCapO, PLLCapl, VccP, and 
VssP, as illustrated in Figure E-1. 

In addition, the capacitors for PLLCapO (Cp) and PLLCapl (Cp) can be 

connected to either VssP (as shown), VccP, or one to VssP and one to 
VccP. Note that C2 and the Cp capacitors are incorporated into both the 
179PGA and 447PGA package designs as surface-mounted chip 
capacitors. 
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Figure E-1 PLL Passive Components 
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Figiire E-2 shows a top view of the 179-pin package with capacitors. 



x: Vss-Vcc Bypass Caps 
C2: VssP-VccP Bypass Caps 
%1, %2: PLL Caps 




Figure E-2 179-Pin Package 
Figiire E-3 shows a top view of the 447-pin package with chip capacitors. 



x: Vss-Vcc Bypass Caps 
C2: VssP-VccP Bypass Caps 
%1, %2: PLL Caps 




Figure E-3 447-Pin Package 

It is essential to isolate the analog power and ground for the PLL circuit 
(VccP/VssP) from the regular power and groiind (Vcc/Vss). Initial 
evaluations have yielded good results with the following values: 
R = 5 ohms CI = 1 nF C2 = 82 nF 



C3 = 10 laF 



Cp = 470 pF 



Since the optimum values for the filter components depend upon the 
application and the system noise environment, these values should be 
considered as starting points for further experimentation within yoiir 
specific application. In addition, the chokes (inductors: L) can be 
considered for use as an alternative to the resistors (R) for use in filtering 
the power supply. 
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The contents of the System Coprocessor registers and the TLB affect the 
operation of the processor in many ways. For instance, an instruction that 
changes CPO data also affects subsequent instructions that use the data. 

In the CPU, general registers are interlocked and the result of an 
instruction can generally be used by the next instruction; if the result is not 
available right away, the processor stalls imtil it is available. CPO registers 
and the TLB are not interlocked, however; there may be some delay before 
a value written by one instruction is available to following instructions. 

There is a required-data dependence between an instruction that changes a 
register or TLB entry (a writer) and the next instruction that uses it (a user). 
(A writer can write multiple data items, forming multiple writer/ user 
pairs.) The writer/ user instruction pair places a hazard on the data if there 
must be a delay between the time the writer instruction writes the data, 
and the user instruction can use the data. 

In addition to instructions, events can be writers and users of CPO 
information. For instance, an exception writes information to CPO 
registers and events that occur for every instruction, like an instruction 
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fetch, use CPO information. Therefore, when manipulating CPO contents, 
the systems programmer must identify hazards and write code that avoids 
these hazards. 

Table F-1 describes how to identify and avoid hazards, listing instructions 
and events that use CPO registers and the TLB. This table also tells when 
written information is available (column 3) and when this latest 
information can actually be used (column 2). Exception event writer timing 
refers to the instruction identified with the exception; user event timing 
information is the pipestage of each instruction during which the user 
event uses the data. In the case of a hazard, the nimiber of instructions 
required between a writer and user is: 

availahle_stage - (use_stage + 1) 

To identify a hazard, look for an instruction /event writer /user pair that 
has a required-data dependence and use the timing information in the 
table to calculate the delay required between the writer and user. If no 
delay is required, there is no hazard. If there is a hazard, place enough 
instructions between the writer and user so that the written information is 
available or effective when the user needs it. 

NOTE: Any instructions inserted between a writer/ reader pair with 
a hazard must not depend on or modify the data creating the hazard 
(for example NOP instructions may be used). 

The following steps are used to determine a hazard delay: 

1. Find the pipeline stage of the writer instruction in which the result 
is available. For example, the MTCO instruction writes a CPO 
general register, and the new value is available at stage 7. 

2. Find the pipeline stage in which the user instruction reads or uses 
the data item that the writer changes. The TLBWR instruction, for 
example, uses different registers through different stages; all source 
register values must be stable by stage 5 and remain unchanged 
through stage 8. 

3. Calculate the number of instructions that must be inserted between 

the hazardous pair, by using this formula: available _stage - 
(use_stage + 1). For example, with an MTCO/TLBWR pair, MTCO 
data is available at stage 7, and TLBWR data must be stable by 
stage 5 so the computation is: 7 - (5 + 1) = 1. This means 1 
instruction must be inserted between the MTCO and TLBWR. If the 
result of the computation is less than or equal to zero, there is no 
hazard and no instructions are required between the pair. 
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Table F-1 R4000 Coprocessor 0 Data Writer and User Timing 



Instruction or Event 


CPO Data Used, Stage Used 


CPO Data Written, Stage Available 


MTCO / DMTCO 




CPRlO,rd] 


7y6 


MFCO / DMFCO 


CPR[0,rd] 






TLBR 


Index, TLB 


5-7 


PageMask, EntryHi, 
EntryLoO, EntryLol 


8 


TLBWI 
TLBWR 


TtiHpy CiV l^flTiHrim 

PageMask, EntryHi, 
EntryLoO, EntryLol 


5-8 


TLB 


8 


TLBP 


PageMask, EntryHi 


3-6 


Index 


7 


ERET 


EPC or ErrorEPC, TLB 


4 


Status[EXL, ERL] 


4-8a 


Status 


3 


LLbit 


7 


Index Load Tag 






TagLo, TagHi, ECC 


8pe 


Index Store Tag 


TagLo, TagHi, ECC 


Be 




CACHE Hit oos 






Status[CH] 


8£ 


CACHE ops 


cache line (see note) 


e 


cache line (see note) 


e 


Load /Store 


EntryHi-ASID 

Status[KSU, EXL, ERL, RE], 
Config[KO, DB], TLB 


4 




Config[SB] 


7 


WatchHi, WatchLo 


4-5 


Load /Store exception 




EPC, Status, Cause, 
BadVaddr, Context, XContext 


8 


Instruction fetch 
exception 




EPC, Status 


8 


Cause, BadVAddr, Context, 
XContext 


4 


Instruction fetch 


EntryHi[ASID], 
Status[KSU, EXL, ERL, RE], 
Config[KO, IB] 


Oa 




Config.SB 


3 


TLB (mapped addresses) 


2 


Coproc. usable test 


Status[CU, KSU, EXL, ERL] 


2 




Interrupt signals 
sampled 


Cause[IP], 

Stahis[IM, IE, EXL, ERL] 


3 




TLB shutdown 




Status.TS 


7 



EntryHi.ASID refers to the ASID field of the EntryHi register. 
Config[KO, DB] refers to the KO and DB fields of the Config register. 

a The EXL and ERL bits in the Status register are permanently 

cleared in stage 8, if no exceptions abort the ERET. However the 
effect of clearing them is visible to an instruction fetch starting in 
stage 4, so the "returned to" instructions use the modified values in 
the Status register. 
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P Only one instruction is needed to separate Index Load Tag and 
MFCO Tag, even though table timing indicates otherwise. 

Y An MTCO of a CPR must not be immediately followed by MFCO of 
the same CPR. 

5 With an MTCO to Status that modifies KSU and sets EXL or ERL, it 
is possible for the five instructions following the MTCO to be 
executed incorrectly in the new mode, and not correctly in the 
kernel mode. This can be avoided by setting EXL first, and only 
later changing the value of KSU. 

e There must be two non-load, non-CACHE instructions between a 
store and a CACHE instruction directed to the same primary cache 
line as the store. 

Table F-2 lists some hazard conditions, and the number of instructions that 
must come between the writer and the user. The table shows the data item 
that creates the hazard, and the calculation for the required number of 
intervening instructions. 



Table F-2 CPO Fiazards and Calculated Delay Times. 



Writer 




User 


Hazard On 


Instructions 
Between 


Calculation 


TLBWR/ 
TLBWI 




TLBP 


TLB entry 


3 


8-(4+l) 


TLBWR/ 
TLBWI 




load /store using new TLB 
entry 


TLB entry 


3 


8-(4+l) 


TLBWR/ 
TLBWI 


-> 


I-fetch using new TLB 
entry 


TLB entry 


5 


8-(2+l) 


MTCO 
Status[CU] 




Coprocessor instruction 
needs CU set 


Status[CU] 


4 


7-(2+l) 


TLBR 




MFCO EntryHi 


EntryHi 


3 


8-(4+l) 


MTCOEntryLoO — > 


TLBWR/TLBWI 


EntryLoO 


1 


7-(5+l) 


TLBP 




MFCO Index 


Index 


2 


7-(4+l) 


MTCO EntryHi 




TLBP 


EntryHi 


1 


7-(5+l) 


MTCO EPC 




ERET 


EPC 


2 


7-(4+l) 


MTCO Status 




ERET 


Status 


3 


7-(3+l) 


MTCO 
Status[IE] 




instruction interrupted^ 


Status [IE] 


3 


7-(3+l) 



t. You cannot depend on a delay in effect if the instruction execution order is changed by exceptions. 
In this case, for example, the minimum, delay for IE to be effective is the maximum delay before a 
pending, enabled interrupt can occur. 
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This Appendix shows the pinouts for the three microprocessor 
configurations: R4000PC, R4000SC, and R4000MC. 

NOTE: This entire Appendix, Appendix G, is new for the 
second edition. 
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G.l Pinout of R4000PC 

Figure G-1 shows the physical pinout of the R4000PC. Table G-1 lists 
the signal-to-pin correspondence. 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 



V 




V 


1 1 

u 




1 1 

u 


T 




T 








R 








R 


P 








p 


N 








N 


M 








M 


L 




R4000 PC Pinout 




L 


K 








K 






X 






J 








J 






Bottom 






H 








H 


G 








G 


F 








F 


E 








E 


D 








D 


C 




C 


B 




B 


A 


\ 


A 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 

Figure G-1 R4000PC Physical Pinout 
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Table G-1 Sigml-to-Pin Correspondences for the R4000PC 



R4000 


PCPkg 


R4000 


PCPkg 


R4000 


PCPkg 


Function 


Pin 


Function 


Pin 


Function 


Pin 


ColdReset* 


T14 


ExtRqst* 


U2 


Fault* 


B16 


NC 


UIO 


Vcc 


T9 


lOIn 


T13 


lOOut 


U12 


lnt*0 


N2 


Int*l 


L3 


Int*2 


K3 


Int*3 


J3 


Int*4 


H3 


Int*5 


F2 


JTCK 


H17 


JTDI 


G16 


JTDO 


F16 


JTMS 


E16 


MasterClock 


J 17 


MasterOut 


P17 


ModeClock 


B4 


Modein 


U4 


NMF 


U7 


PLLCapO 




PLLCapl 




RClockO 


T17 


RClockl 


R16 


RdRdy* 


T5 


Release* 


V5 


Reset* 


U16 


S5mcln 


J16 


SyncOut 


P16 


SysADO 


J2 


SysADl 


G2 


SysAD2 


El 


SysAD3 


E3 


SysAD4 


C2 


SysADS 


C4 


SysAD6 


B5 


SysAD7 


B6 


SysADS 


B9 


SysAD9 


Bll 


SysADlO 


C12 


SysADll 


B14 


SysAD12 


B15 


SysAD13 


C16 


SysAD14 


D17 


SysADlS 


E18 


SysAD16 


K2 


SysAD17 


M2 


SysADlS 


PI 


SysAD19 


P3 


SysAD20 


T2 


SysAD21 


T4 


SysAD22 


U5 


SysAD23 


U6 


SysAD24 


U9 


SysAD25 


Ull 


SysAD26 


T12 


SysAD27 


U14 


SysAD28 


U15 


SysAD29 


T16 


SysAD30 


R17 


SysAD31 


M16 


SysAD32 


H2 


SysAD33 


G3 


SysAD34 


F3 


SysAD35 


D2 


SysAD36 


C3 


SysAD37 


B3 


SysAD38 


C6 


SysAD39 


C7 


SysAD40 


CIO 


SysAD41 


Cll 


SysAD42 


B13 


SysAD43 


A15 


SysAD44 


C15 


SysAD45 


B17 


SysAD46 


E17 


SysAD47 


F17 


SysAD48 


L2 


SysAD49 


M3 


SysADSO 


N3 


SysADSl 


R2 


SysAD52 


T3 


SysAD53 


U3 


SysAD54 


T6 


SysAD55 


T7 


SysAD56 


TIO 


SysAD57 


Til 


SysAD58 


U13 


SysAD59 


V15 


SysAD60 


T15 


SysAD61 


U17 


SysAD62 


N16 


SysAD63 


N17 


SysADCO 


C8 



t. This node has capacitors for the PLL premounted to the package. 



MIPS R4000 Microprocessor User's Manual 



G-3 



Appendix G 



Table G-1 (cont.) Sigml-to-Pin Correspondences for the R4000PC 



R4000 


PCPkg 


R4000 


PCPkg 


R4000 


PCPkg 


Function 


Pin 


Function 


Pin 


Function 


Pin 


SysADCl 


G17 


SysADC2 


T8 


SysADCS 


L16 


SysADC4 


B8 


SysADCS 


H16 


SysADC6 


U8 


SysADC7 


L17 


SysCmdO 


E2 


SysCmdl 


DS 


SysCmd2 


B2 


SysCmdS 


A5 


SysCmd4 


B7 


SysCmdS 


C9 


SysCmd6 


BIO 


SysCmd7 


B12 


SysCmdS 


C13 


SysCmdP 


C14 


TClockO 


C17 


TClockl 


D16 


VCCOk 


Ml 7 


Validin* 


P2 


ValidOut* 


R3 


WrRdy* 


C5 


VccP 


K17 


VssP 


K16 


Vcc 


A2 


Vcc 


A4 


Vcc 


A7 


Vcc 


A9 


Vcc 


All 


Vcc 


A13 


Vcc 


A16 


Vcc 


B18 


Vcc 


CI 


Vcc 


D18 


Vcc 


Fl 


Vcc 


G18 


Vcc 


HI 


Vcc 


J18 


Vcc 


Kl 


Vcc 


L18 


Vcc 


Ml 


Vcc 


N18 


Vcc 


Rl 


Vcc 


T18 


Vcc 


Ul 


Vcc 


V3 


Vcc 


V6 


Vcc 


V8 


Vcc 


VIO 


Vcc 


V12 


Vcc 


V14 


Vcc 


V17 


Vss 


AS 


Vss 


A6 


Vss 


A8 


Vss 


AlO 


Vss 


A12 


Vss 


A14 


Vss 


A17 


Vss 


A18 


Vss 


Bl 


Vss 


C18 


Vss 


Dl 


Vss 


F18 


Vss 


Gl 


Vss 


H18 


Vss 


Jl 


Vss 


K18 


Vss 


LI 


Vss 


M18 


Vss 


Nl 


Vss 


PIS 


Vss 


R18 


Vss 


Tl 


Vss 


U18 


Vss 


VI 


Vss 


V2 


Vss 


V4 


Vss 


V7 


Vss 


V9 


Vss 


Vll 


Vss 


V13 


Vss 


V16 


Vss 


V18 
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G.2 Pinout of R4000MC/SC Package Pinout 

Figure G-2 shows the physical pinout of the R4000MC and SC. Table 
G-2 lists the signal-to-pin correspondence. 



AW AU AR AN AL AJ AG AE AC AA W U R N L J G E C A 
AV AT AP AM AK AH AF AD AB Y V T P M K H F D B 



39 
38 
37 
36 
35 
34 
33 
32 
31 
30 
29 
28 
27 
26 
25 
24 
23 
22 
21 
20 
19 
18 
17 
16 
15 
14 
13 
12 
11 
10 
9 
8 
7 
6 
5 
4 
3 
2 
1 



R4000 MC/SC 447 Pinout 
(bottom) 



AW AU AR AN AL AJ AG AE AC AA W U R N L J G E C A . 
AV AT AP AM AK AH AF AD AB Y V T P M K H F D B / 



Figure G-2 R4000MC/SC Physical Pinout 
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Table G-2 Sigml-to-Pin Correspondences for the R4000MC/SC 



R4000 


PCPkg 


R4000 


PCPkg 


R4000 


PCPkg 


Function 


Pin 


Function 


Pin 


Function 


Pin 


ColdReset* 


AW37 


ExtRqst* 


AV2 


Fault* 


C39 


NC 


AV24 


Vcc 


AV20 


lOIn 


AV32 


lOOut 


AV28 


lnt*0 


ALl 


IvdAck*"^ 


AA3S 


IvdErr*^ 


AA39 


JTCK 


U39 


JTDI 


N39 


JTDO 


J39 


JTMS 


G37 


MasterClock 


AA37 


MasterOut 


AJ39 


ModeClock 


B8 


Modeln 


AV8 


NMF 


AV16 


PLLCapO 




PLLCapl 




RClockO 


AM34 


RClockl 


AL33 


RdRdy* 


AW7 


Release* 


AV12 


Reset* 


AU39 


SC64Addr 


Y2 


SCAParO 


U5 


SCAParl 


Ul 


SCAPar2 


P4 


SCAddl 


AL5 


SCAdd2 


AGl 


SCAdd3 


AE7 


SCAdd4 


ACl 


SCAdd5 


ACS 


SCAdd6 


AC3 


SCAddZ 


AAl 


SCAddS 


AB4 


SCAdd9 


AAS 


SCAddrlO 


AA7 


SCAddrll 


AA3 


SCAddrl2 


W3 


SCAddrl3 


Y6 


SCAddrl4 


W5 


SCAddrlS 


W7 


SCAddrl6 


Wl 


SCAddrlZ 


U3 


SCAddrOW 


AN7 


SCAddrOX 


AN5 


SCAddrOY 


AM6 


SCAddrOZ 


AL7 


SCDCS* 


M6 


SCDChkO 


G19 


SCDChkl 


T34 


SCDChk2 


AP20 


SCDChk3 


AD34 


SCDChk4 


C19 


SCDChkS 


R37 


SCDChk6 


AU19 


SCDChk7 


AE37 


SCDChkS 


CI 7 


SCDChk9 


N37 


SCDChklO 


AU17 


SCDChkll 


AG37 


SCDChkl2 


E19 


SCDChkl3 


R3S 


SCDChkl4 


AR19 


SCDChklS 


AE35 


SCDataO 


R3 


SCDatal 


R7 


SCData2 


L5 


SCDataS 


F8 


SCData4 


C9 


SCDataS 


F12 


SCData6 


G15 


SCDataZ 


El 7 


SCDataS 


G21 


SCData9 


C2S 


SCDatalO 


G25 


SCDatall 


E29 


SCDatal2 


G31 


SCDatal3 


C35 


SCDatal4 


K36 


SCDatalS 


N3S 


SCDatal6 


AE3 


SCDatal7 


AGS 


SCDatalS 


AK4 


SCDatal9 


AN9 


SCData20 


AU9 


SCData21 


AN13 


SCData22 


ATM 


SCData23 


AR17 


SCData24 


AT22 



t. Used only in the MC part. Must be tied to Vcc for the SC part. 
|. This node has capacitors for the PLL premounted to the package. 
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Table G-2 (cont.) Sigml-to-Pin Correspondences for the R4000MC/SC 



R4000 


PCPkg 


R4000 


PCPkg 


R4000 


PCPkg 


Function 


Pin 


Function 


Pin 


Function 


Pin 


SCData25 


AU25 


SCData26 


AN27 


SCData27 


AR29 


SCData28 


AN31 


SCData29 


AR35 


SCData30 


AK36 


SCDataSl 


AG35 


SCData32 


T6 


SCData33 


L3 


SCData34 


L7 


SCData35 


E7 


SCData36 


Gil 


SCData37 


E13 


SCData38 


E15 


SCData39 


G17 


SCData40 


C23 


SCData41 


F24 


SCData42 


E27 


SCData43 


D30 


SCData44 


C33 


SCData45 


E35 


SCData46 


L35 


SCData47 


R33 


SCData48 


AF4 


SCData49 


AJ3 


SCDataSO 


AJ7 


SCDataSl 


AP8 


SCData52 


ATIO 


SCData53 


AR13 


SCData54 


AR15 


SCData55 


AT18 


SCData56 


AU23 


SCData57 


AT26 


SCData58 


AR27 


SCData59 


AN29 


SCData60 


AP32 


SCData61 


AN35 


SCData62 


AJ35 


SCData63 


AE33 


SCData64 


V4 


SCData65 


R5 


SCData66 


N5 


SCData67 


E5 


SCData68 


G9 


SCData69 


Ell 


SCDataZO 


G13 


SCData71 


D14 


SCData72 


C21 


SCData73 


D22 


SCData74 


E25 


SCData75 


G27 


SCData76 


C31 


SCData77 


F32 


SCData78 


J35 


SCData79 


M34 


SCData80 


AC7 


SCData81 


AE5 


SCData82 


AG7 


SCData83 


AR5 


SCData84 


AR9 


SCData85 


ARll 


SCData86 


AN15 


SCData87 


AP16 


SCData88 


AU21 


SCData89 


AN23 


SCData90 


AR25 


SCData91 


AP28 


SCData92 


AU31 


SCData93 


AR33 


SCData94 


AL35 


SCData95 


AH34 


SCData96 


U7 


SCData97 


N3 


SCData98 


N7 


SCData99 


C5 


SCDatalOO 


E9 


SCDatalOl 


Cll 


SCDatal02 


C13 


SCDatal03 


F16 


SCDatal04 


E21 


SCDatalOS 


G23 


SCDatal06 


C27 


SCDatal07 


F28 


SCDatal08 


E31 


SCDatal09 


G33 


SCDatallO 


J37 


SCDatalll 


N33 


SCDatall2 


AD6 


SCDatall3 


AG3 


SCDatall4 


AJ5 


SCDatallS 


AU5 


SCDatall6 


ANll 


SCDatall7 


AUll 


SCDatall8 


AU13 


SCDatall9 


AN17 


SCDatal20 


AR21 
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Table G-2 (cont.) Sigml-to-Pin Correspondences for the R4000MC/SC 



R4000 


PCPkg 


R4000 


PCPkg 


R4000 


PCPkg 


Function 


Pin 


Function 


Pin 


Function 


Pin 


SCDatal21 


AP24 


SCDatal22 


AU27 


SCDatal23 


AT30 


SCDatal24 


AU33 


SCDatal25 


AN33 


SCDatal26 


AL37 


SCDatal27 


AG33 


SCOE* 


Nl 


SCTCS* 


Jl 


SCTChkO 


AN21 


SCTChkl 


AN19 


SCTChk2 


AU15 


SCTChkS 


AP12 


SCTChk4 


AU7 


SCTChkS 


AR7 


SCTChk6 


AH6 


SCTagO 


K4 


SCTagl 


G7 


SCTag2 


C7 


SCTag3 


DIO 


SCTag4 


C15 


SCTagS 


D18 


SCTag6 


F20 


SCTag7 


E23 


SCTagS 


D26 


SCTag9 


C29 


SCTaglO 


G29 


SCTagll 


E33 


SCTagl2 


G35 


SCTagl3 


L33 


SCTagl4 


L37 


SCTaglSO 


F36 


SCTagl6 


AF36 


SCTagl7 


AJ37 


SCTaglS 


AJ33 


SCTagl9 


AN37 


SCTag20 


AU35 


SCTag21 


AR31 


SCTag22 


AU29 


SCTag23 


AN25 


SCTag24 


AR23 


SCWrW* 


J5 


SCWrX* 


J7 


SCWrY* 


H6 


SCWrZ* 


G5 


StatusO 


U33 


Status 1 


U35 


Status2 


V36 


StatusS 


W35 


Status4 


W37 


StatusS 


AC37 


Status6 


AC35 


Status7 


AC33 


Syncin 


W39 


SyncOut 


AN39 


SysADO 


T2 


SysADl 


M2 


SysAD2 


J3 


SysAD3 


G3 


SysAD4 


CI 


SysADS 


A3 


SysAD6 


A9 


SysAD7 


A13 


SysADS 


A21 


SysAD9 


A25 


SysADlO 


A29 


SysADll 


A33 


SysAD12 


B38 


SysAD13 


E37 


SysAD14 


G39 


SysADlS 


L39 


SysAD16 


AD2 


SysADlZ 


AH2 


SysADlS 


AL3 


SysAD19 


AN3 


SysAD20 


AUl 


SysAD21 


AW3 


SysAD22 


AW9 


SysAD23 


AW13 


SysAD24 


AW21 


SysAD25 


AW25 


SysAD26 


AW29 


SysAD27 


AW33 


SysAD28 


AV38 


SysAD29 


AR37 


SysAD30 


AM38 


SysAD31 


AH38 


SysAD32 


Rl 


SysAD33 


LI 


SysAD34 


H2 


SysAD35 


El 


SysAD36 


C3 


SysAD37 


A5 


SysAD38 


All 


SysAD39 


A15 


SysAD40 


A23 
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Table G-2 (cont.) Sigml-to-Pin Correspondences for the R4000MC/SC 



R4000 


PCPkg 


R4000 


PCPkg 


R4000 


PCPkg 


Function 


Pin 


Function 


Pin 


Function 


Pin 


SysAD41 


A27 


SysAD42 


A31 


SysAD43 


A35 


SysAD44 


C37 


SysAD45 


539 


SysAD46 


H38 


SysAD47 


M38 


SysAD48 


AEl 


SysAD49 


AJl 


SysADSO 


AM2 


SysADSl 


ARl 


SysAD52 


AU3 


SysAD53 


AW5 


SysAD54 


AWll 


SysAD55 


AW15 


SysAD56 


AW23 


SysAD57 


AW27 


SysAD58 


AW31 


SysAD59 


AW35 


SysAD60 


AU37 


SysAD61 


AR39 


SysAD62 


AL39 


SysAD63 


AG39 


SysADCO 


A17 


SysADCl 


R39 


SysADC2 


AW17 


SysADC3 


AD38 


SysADC4 


A19 


SysADCS 


T38 


SysADC6 


AW19 


SysADC7 


AC39 


SysCmdO 


Gl 


SysCmdl 


E3 


SysCmd2 


B2 


SysCmd3 


B12 


SysCmd4 


B20 


SysCmdS 


B24 


SysCmd6 


B28 


SysCmd7 


B32 


SysCmdS 


A37 


SysCmdP 


H34 


TClockO 


H34 


TClockl 


J33 


VCCOk 


AE39 


Validln* 


ANl 


ValidOut* 


AR3 


WrRdy* 


A7 


VccSense 


W33 


VssSense 


U37 


VccP 


AA33 


VssP 


Y34 


Vcc 


A39 


Vcc 


B6 


Vcc 


BIO 


Vcc 


B18 


Vcc 


B26 


Vcc 


B34 


Vcc 


D4 


Vcc 


D8 


Vcc 


D16 


Vcc 


D24 


Vcc 


D32 


Vcc 


D36 


Vcc 


F2 


Vcc 


F14 


Vcc 


F22 


Vcc 


F30 


Vcc 


F38 


Vcc 


H4 


Vcc 


H36 


Vcc 


K6 


Vcc 


K38 


Vcc 


Y38 


Vcc 


AB2 


Vcc 


AB34 


Vcc 


AD4 


Vcc 


AD36 


Vcc 


AF6 


Vcc 


AF38 


Vcc 


AK2 


Vcc 


AK34 


Vcc 


AM4 


Vcc 


AM36 


Vcc 


AP2 


Vcc 


APIO 


Vcc 


AP18 


Vcc 


AP26 


Vcc 


AP38 


Vcc 


AT4 


Vcc 


AT8 


Vcc 


AT16 


Vcc 


AT24 


Vcc 


AT32 


Vcc 


AT36 


Vcc 


AV6 


Vcc 


AV14 
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Table G-2 (cont.) Sigml-to-Pin Correspondences for the R4000MC/SC 



R4000 


PCPkg 


R4000 


PCPkg 


R4000 


PCPkg 


Function 


Pin 


Function 


Pin 


Function 


Pin 


Vcc 


AV22 


Vcc 


AV30 


Vcc 


AV34 


Vcc 


AWl 


Vcc 


AW39 


Vss 


B4 


Vss 


B14 


Vss 


522 


Vss 


B30 


Vss 


B36 


Vss 


D2 


Vss 


D6 


Vss 


D12 


Vss 


D20 


Vss 


D28 


Vss 


D34 


Vss 


D38 


Vss 


F4 


Vss 


F6 


Vss 


FIO 


Vss 


F18 


Vss 


F26 


Vss 


F34 


Vss 


F36 


Vss 


K2 


Vss 


K34 


Vss 


M4 


Vss 


M36 


Vss 


P6 


Vss 


F38 


Vss 


V2 


Vss 


Y4 


Vss 


Y36 


Vss 


AB6 


Vss 


AB36 


Vss 


AB38 


Vss 


AF2 


Vss 


AF34 


Vss 


AH4 


Vss 


AH36 


Vss 


AK6 


Vss 


AK38 


Vss 


AF4 


Vss 


AF6 


Vss 


AF14 


Vss 


AP22 


Vss 


AP30 


Vss 


AP34 


Vss 


AP36 


Vss 


AT2 


Vss 


AT6 


Vss 


AT12 


Vss 


AT20 


Vss 


AT28 


Vss 


AT34 


Vss 


AT38 


Vss 


AV4 


Vss 


AVIO 


Vss 


AV18 


Vss 


AV26 


Vss 


AV36 
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Numerics 

32-bit 

addressing 109 
applications 9 
data format 24 
instructions 36 
operands, in 64-bit mode 39 
operations 6, 67 
single-precision FP format 164 
virtual-to-physical-address 
translation 65 

32-bit mode 

address space 3 1 

address translation 65, 95 

addresses 63 

FPU operations 153 

TLB entry format 81 
4th Floor 

B-dorm. See Alco Hall 
64-bit 

addressing 109 

ALU 9 

bus, address and data 201 
data format 24 

double-precision FP format 164 
floating-point registers 156 
FPU 9 

internal data path widths 381 
operations 6, 39, 67 
System interface 1 1 
virtual-to-physical-address 
translation 66 



64-bit mode 

32-bit operands, handling of 39 

address space 3 1 

address translation 66, 95 

addresses 63 

FPU operations 153 

TLB entry format 81 

A 

address acceleration 58 
Address Error exception 127 
address prediction 58 
address space identifier (ASID) 64 
address spaces 

32-bit translation of 65 

64-bit translation of 66 

address space identifier (ASID) 64 

physical 64 

virtual 63 

virtual-to-physical translation of 64 
addressing 

and data formats 24 
big-endian 24 
Kernel mode 73 
little-endian 24 
misaligned data 26 
Supervisor mode 69 
User mode 67 

virtual address translation 95 
See also address spaces 
Alco Hall vs. Acid. See game, softball 
application software, compatibility with 
MIPS R2000, R3000, and R6000 
processors 6 
architectiire 
64-bit 9 

superpipeline 11 
array, page table entry (PTE) 102 
ASID. See address space identifier 
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B 

Bad Virtual Address register (BadVAddr) 
103 

big-endian, byte addressing 24, 170 
binary fixed-point format 166 
bit definition of 

ERL 68, 69, 73, 109 

EXL 68,69, 73, 109, 112, 119 

IE 109 

KSU 68, 69, 73 

KX 73, 109 

SX 69, 109 

UX 68, 109 
boot-mode settings 222 
boimdary scanning 390 
Boundary-scan register 394 
branch delay 48 
branch instructions, CPU 15, 41 
branch instructions, FPU 170 
Breakpoint exception 138 
Bus Error exception 134 
B3rpass register 393 
byte addressing 

big-endian 24, 170 

little-endian 24, 170 
byte ordering 24 

big-endian 24 

in doublewords 25 

little-endian 24 

c 

cache 33 

Cache Error (CacheErr) register 116 
Cache Error exception 132 
Cache Error exception process 120 
caches 

attributes 
clean 255 

clean exclusive 256 
dirty 255 

dirty exclusive 256 



dirty shared 255 

exclusive 255 

invalid 255 

shared 255 
coherency 

attributes 264 

conflicts 271-285 

maintaining coherency on load 

and store operations 269 

protocol, overview 264 

S5mchronization 286 
description 246 
line ownership 258 
manipulation by an external agent 270 
mapping states between caches 257 
memory hierarchy 32, 244 
misses 

address prediction 58 

handling 49 

performance considerations 58 

pipeline back-up 54 
on-chip instruction and data caches 33 
on-chip primary caches 33, 246 
operation modes 266 
optional external secondary cache 32 
ordering constraints 267 
overview of operations 245 
primary cache, states 256 
primary data cache 

accessing 25 1 

line size 250 
primary instruction cache 

accessing 25 1 

line size 249 
secondary cache 

accessing 254 

line size 252 

organization 252 

states 256 
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Secondary Cache interface 33, 379 

See also Secondary Cache interface 
secondary cache sizes 248 
state diagrams 260 
strong ordering 

example of 267 

testing for 267 
terminology 243 
write-back policy 259 
Cause register 110 
central processing unit (CPU) 
cache memory hierarchy 32 
data formats and addressing 24 
exception processing 99 

See also exception processing, CPU 
features 6-33 
instruction formats 14, 36 
instruction pipeline, basic operation 
43 

See also pipeline, CPU 
instruction set 

extensions 16 

overview 14, 35 

tj^es of instructions 15 
instructions. See instructions, CPU 
interrupts 401 

See also interrupts, CPU 
memory management 3 1 

See also memory management 
memory organization 244 
operating modes 32 
registers 12 

See also registers, CPU 
System Control Coprocessor (CPO) 
27,80 

See also System Control 
Coprocessor 
System interface 293 

See also System interface 
transfers between FPU and CPU 169 
CISC. See complex instruction set 

computer 



cksegO 79 
cksegl 79 
ckseg3 79 
cksseg 79 

Class of 73. See 4th Floor 
B-dorm 

clean exclusive, cache attribute 256 
clean, cache attribute 255 
Clock interface 

connecting clocks 

to CMOS logic system 238 
to gate-array device 235 
to phase-locked system 234 
to system without phase locking 
235 

signals 203, 227 

status outputs 241 

system timing parameters 233 
clocks, system 229 
coherency. See caches, coherency 
cold reset 214 

compare instructions, FPU 171 
Compare register 104 
compatibility 

application software, with MIPS 
R2000, R3000, and R6000 
processors 6 

DEC VAX 24 

iAPX x86 24 

IBM 370 24 

MC68000 24 
compilers, MIPS suite of 5 
complex instruction set computer (CISC) 

compared with RISC, in languages 
used 4 

historical context 1-2 
computational instructions, CPU 15 

64-bit operations 39 

cycle timing for multiply and divide 
instructions 40 

formats 39 
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computational instructions, FPU 

floating-point 170 
Config register 90 
Context register 102 
Control/Status register, FPU 157, 159 
conversion instructions, FPU 170 
coprocessor instructions 15, 42 
Coprocessor Unusable exception 140 
correctness considerations 58 
Count register 103 
CPO. See System Control Coprocessor 
CPU. See central processing unit 
csseg 72 

cycle time, interlock and exception 
handling 53 

D 

data alignment 170 
Data Fetch, First Half (DP) 46 
Data Fetch, Second Half (DS) 46 
data formats 

and addressing 24 
byte ordering 24 
data identifiers 364 
data transmission errors, ECC 
detecting 418 
t5^es of 

double data bit 422 

four data bit 424 

single check bit 421 

single data bit 420 

three data bit 423 
DEC VAX, compatibility with 24 
delayed load instruction 37 
design cycles, RISC vs. CISC 3 
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